Skip to content

Analysis

The analysis of the data is can be done after the pipeline.

full pipeline

Getting an analysis report

The analysis report is the sequencing and flow cytometry data combined. It has all count and frequencies of the flow data and the frequencies of VRC01 class among the sequences. It will then combine those frequencies to give a final frequency of VRC01 among some cell type phenotype.

The following will produce an analysis report and combine data. It will output a flow

$ g00x g002 analysis report -s g002/G002/output/final_df.feather -f g002/G002/output/flow_output.feather -o g002/G002/output/flow_and_sequencing

from g00x.data import Data
from g00x.analysis.report import combine_seq_and_flow

data = Data()
sequencing_dataframe_path = "g002/G002/output/final_df.feather"
flow_dataframe_path = "g002/G002/output/flow_output.feather"

# input the sequeences to feather
sequencing_dataframe = pd.read_feather(sequencing_dataframe_path)
flow_dataframe = pd.read_feather(flow_dataframe_path)

# generate three different dataframe
seq_and_flow_df, seq_and_flow_df_long_name, seq_and_flow_df_long_form = combine_seq_and_flow(
    data, sequencing_dataframe, flow_dataframe
)

This will output a flow_and_sequencing.feather, flow_and_sequencing_long_names.feather, and flow_and_sequencing_long_form.feather.

The long form is the long form dataframe:

run_purpose run_date pubID ptid group weeks visit_id probe_set sample_type value_type value short_name long_name pbmc_gate_expression
319 PreS 2022-10-24 G002-516 G002516 1 16 V270 Cg28v2 PBMC count 418948 B cells_cell_count B cells_cell_count P4
2326 PreS 2022-10-31 G002-884 G002884 3 -5 V091 eODGT8 PBMC count 18298 IgG-IgA-/IgM+/KO-_cell_count IgG-IgA-/IgM+/KO-_cell_count P25

The long names are pivoted:

run_purpose run_date pubID ptid group weeks visit_id probe_set sample_type B cells_cell_count CD19+/CD20-/CD27+CD38+/Antigen++/KO-_cell_count CD19+/CD20-/CD27+CD38+/Antigen++_cell_count CD19+/CD20-/CD27+CD38+/KO-/Epitope++_cell_count CD19+/CD20-/CD27+CD38+/KO-_cell_count CD19+/CD20-/CD27+CD38+_cell_count CD19+/CD20-_cell_count CD19+_cell_count Dump-_cell_count Frequency of IGD- sequences that are VRC01-class Frequency of IGHA sequences that are VRC01-class Frequency of IGHD sequences that are VRC01-class Frequency of IGHG sequences that are VRC01-class Frequency of IGHM sequences that are VRC01-class Frequency of VRC01-class sequences among IgA Frequency of VRC01-class sequences among IgD- Frequency of VRC01-class sequences among IgG Frequency of VRC01-class sequences among IgM IgD+ B cells_cell_count IgD+/Antigen++/KO-_cell_count IgD+/Antigen++_cell_count IgD+/KO-/Epitope++_cell_count IgD+/KO-_cell_count IgD- B cells_cell_count IgD-/Antigen++/KO-_cell_count IgD-/Antigen++_cell_count IgD-/KO-/Epitope++_cell_count IgD-/KO-_cell_count IgG-IgA-/IgM+/Antigen++/KO-_cell_count IgG-IgA-/IgM+/Antigen++_cell_count IgG-IgA-/IgM+/KO-/Epitope++_cell_count IgG-IgA-/IgM+/KO-_cell_count IgG-IgA-/IgM+_cell_count IgG-IgA-_cell_count IgG-IgM-/IgA+/Antigen++/KO-_cell_count IgG-IgM-/IgA+/Antigen++_cell_count IgG-IgM-/IgA+/KO-/Epitope++_cell_count IgG-IgM-/IgA+/KO-_cell_count IgG-IgM-/IgA+_cell_count IgG-IgM-IgD-_cell_count IgM-IgA-/IgG+/Antigen++/KO-_cell_count IgM-IgA-/IgG+/Antigen++_cell_count IgM-IgA-/IgG+/KO-/Epitope++_cell_count IgM-IgA-/IgG+/KO-_cell_count IgM-IgA-/IgG+_cell_count IgM-IgA-_cell_count Lymphocytes_cell_count Number of IGD- sequences that are VRC01-class Number of IGHA sequences that are VRC01-class Number of IGHD sequences that are VRC01-class Number of IGHG sequences that are VRC01-class Number of IGHM sequences that are VRC01-class Percent IgA^{+}KO^- among Ag^{++} Percent IgD^{-}KO^{-} among Ag^{++} Percent IgG^{+}KO^- among Ag^{++} Percent IgM{+}KO^- among Ag^{++} Percent PB{+}KO^- among Ag^{++} Percent antigen-specific among IgD^- Percent antigen-specific among IgG^{+} Percent antigen-specific among IgM Percent epitope-specific (KO^-Ag^{++}) among IgA^{+} Percent epitope-specific (KO^-Ag^{++}) among IgD^- Percent epitope-specific (KO^-Ag^{++}) among IgG^{+} Percent epitope-specific (KO^-Ag^{++}) among IgM Percent epitope-specific (KO^-Ag^{++}) among PB Percent of antigen-specific among IgA^{+} Percent of antigen-specific among PB Singlets_cell_count
46 PreS 2022-10-31 G002-884 G002884 3 16 V500 eODGT8 PBMC 415845 0 5 0 1915 1931 4146 414462 472678 nan nan nan nan nan nan nan nan nan 255368 117 177 117 253414 161568 391 1504 394 158975 3 4 3 16985 17142 25395 13 22 13 38244 38507 46508 340 1383 340 92983 94964 102932 555794 nan nan nan nan nan 59.0909 25.9973 24.5842 0.0176626 0 0.930877 1.45634 75 0.0337601 0.24386 0.35803 0.0175009 0 0.0571325 0.258933 550934
2 PreS 2022-08-25 G002-516 G002516 1 8 V200 eODGT8 PBMC 1.2981e+06 1 1 1 15503 16641 121174 1.27163e+06 1.34031e+06 nan nan nan nan nan nan nan nan nan 864372 456 718 409 801485 433730 554 1070 469 404609 38 54 35 69781 74779 121496 47 97 51 96230 103053 148151 336 709 288 189984 203562 254258 1.37136e+06 nan nan nan nan nan 48.4536 51.7757 47.3907 0.0544561 100 0.246697 0.348297 64.8148 0.0494891 0.108132 0.14148 0.0468046 0.00600925 0.0941263 0.00600925 1.36246e+06

The short names are also pivoted

run_purpose run_date pubID ptid group weeks visit_id probe_set sample_type B cells_cell_count CD19+/CD20-/CD27+CD38+/Antigen++/KO-_cell_count CD19+/CD20-/CD27+CD38+/Antigen++_cell_count CD19+/CD20-/CD27+CD38+/KO-/Epitope++_cell_count CD19+/CD20-/CD27+CD38+/KO-_cell_count CD19+/CD20-/CD27+CD38+_cell_count CD19+/CD20-_cell_count CD19+_cell_count Dump-_cell_count IgD+ B cells_cell_count IgD+/Antigen++/KO-_cell_count IgD+/Antigen++_cell_count IgD+/KO-/Epitope++_cell_count IgD+/KO-_cell_count IgD- B cells_cell_count IgD-/Antigen++/KO-_cell_count IgD-/Antigen++_cell_count IgD-/KO-/Epitope++_cell_count IgD-/KO-_cell_count IgG-IgA-/IgM+/Antigen++/KO-_cell_count IgG-IgA-/IgM+/Antigen++_cell_count IgG-IgA-/IgM+/KO-/Epitope++_cell_count IgG-IgA-/IgM+/KO-_cell_count IgG-IgA-/IgM+_cell_count IgG-IgA-_cell_count IgG-IgM-/IgA+/Antigen++/KO-_cell_count IgG-IgM-/IgA+/Antigen++_cell_count IgG-IgM-/IgA+/KO-/Epitope++_cell_count IgG-IgM-/IgA+/KO-_cell_count IgG-IgM-/IgA+_cell_count IgG-IgM-IgD-_cell_count IgM-IgA-/IgG+/Antigen++/KO-_cell_count IgM-IgA-/IgG+/Antigen++_cell_count IgM-IgA-/IgG+/KO-/Epitope++_cell_count IgM-IgA-/IgG+/KO-_cell_count IgM-IgA-/IgG+_cell_count IgM-IgA-_cell_count Lymphocytes_cell_count Singlets_cell_count num_IGHA_vrc01_class_sequences num_IGHD_vrc01_class_sequences num_IGHG_vrc01_class_sequences num_IGHM_vrc01_class_sequences num_igdneg_vrc01_class_sequences percent_IGHA_vrc01_class_sequences percent_IGHD_vrc01_class_sequences percent_IGHG_vrc01_class_sequences percent_IGHM_vrc01_class_sequences percent_ag_among_iga percent_ag_among_igd_neg percent_ag_among_igg percent_ag_among_igm percent_ag_among_pb percent_ep_among_iga percent_ep_among_igd_neg percent_ep_among_igg percent_ep_among_igm percent_ep_among_pb percent_igako_among_ag percent_igdneg_vrc01_class_sequences percent_iggko_among_ag percent_igmko_among_ag percent_ko_among_ag_igd_neg percent_pbko_among_ag percent_vrc01_among_iga percent_vrc01_among_igd_neg percent_vrc01_among_igg percent_vrc01_among_igm
85 PreS 2022-11-08 G002-852 G002852 1 8 V200 eODGT8 PBMC 992632 3 3 3 1357 1374 8892 982808 1.02292e+06 705541 833 1131 833 698255 289703 930 2231 937 285912 44 57 44 30308 30561 44012 43 82 44 84999 85750 98725 634 1548 634 147423 149684 162491 1.0971e+06 1.08986e+06 nan nan nan nan nan nan nan nan nan 0.0956268 0.770099 1.03418 77.193 0.218341 0.051312 0.323435 0.423559 0.143974 0.218341 52.439 nan 40.9561 0.145176 41.6853 100 nan nan nan nan
84 PreS 2022-11-08 G002-852 G002852 1 4 V160 eODGT8 PBMC 1.10867e+06 79 131 79 1691 1782 10146 1.09833e+06 1.15407e+06 820575 2017 3147 2017 813873 290959 1817 3937 1830 287026 80 145 80 33509 33766 47452 78 119 80 84703 85268 98461 792 1659 792 143728 145494 158611 1.23712e+06 1.2314e+06 nan nan nan nan nan nan nan nan nan 0.13956 1.35311 1.14025 55.1724 7.35129 0.0938218 0.628955 0.544352 0.236925 4.43322 65.5462 nan 47.7396 0.238742 46.1519 60.3053 nan nan nan nan

Count

Easily count the amount of samples we have.

$ g00x g002 analysis count -f g002/G002/output/flow_output.feather -o count

This will output:

table current samples