Gbg MS-data debug

#Gbg-MS-data-debug

andreas.goteson@gu.se
UPDATED: 191004

Import data and setup

#Import-data-and-setup

Quick overview of data

17022 peptides (2250 proteins). From 144 samples + 16 controls = 160 channels over 16 set (10-plex)

Output from PD2.2 is not scaled or normalized.

Prepare the data

#Prepare-the-data

LOD debug

#LOD-debug

Here I compiled all data to se the general distribution

Loading output library...
Loading output library...

Some notes immediately:

  • Normalish
  • The general data distribution is skewed to the left AND there is a left tail
  • There is a low peak, probably on LOD

What about raw abundances?

Loading output library...
Loading output library...

No, LOD's.

Median abundance = 68.4 (this output is not scaled!)

Are any channels enriched for LOD values?

Loading output library...

Plot it!

Loading output library...

Which data are LOD?

Showing a sample of 20 peptides

Loading output library...

When I re-sample a couple of times it's clear that LOD abundance ratios are: either NA or low signal in sample abundance.

  • Could perhaps be because of NA in control channel abundance as well...

Summary:

  • There are several 0.01 ratios (=LOD), which are NAs in sample but for some reason are quantified anyway. Should be deleted if we use reference based quantification!
  • Still, most LOD values are in 126, 127C and 127N which indicates some kind of systematic bias in quantification

Input from Johan Gobom: TMT is not temp sensitive but water sensitive -> condensation reaction w/water -> no primary amine bindings -> BAD!

  • Is there anything in the lab protocol that could potentiall explain this? Like that you always start with 126...

I've previously shown that detection is only set dependent: if a peptide is detected in one channel in a set it will be detected in all other channels as well. But it may perhaps not be quantified in all. E.g. TMT does not bind for some reason for ch 126/127N/127C -> there is no corresponding peak in the spectra -> it is still detected but quantified as NA -> divided with control channel (131) which was quantified and the result is defined as LOD (0.01).

So, how to deal with this? Can I just exclude these proteins? Or is there any general skewness to data?

Some more plots

#Some-more-plots

Some channels look weird! E.g. distribution of abundance ratios of F6/127N shown below: It's centered around -2.5! How can that be?

Loading output library...

Below are all the ratio distributions displayed be TMTch

Loading output library...

So, 130C/N seems like proper data, and to some extent also 128-129. But, 126 and 127N has skrewed distributions! 95% CI varies somewhat where 130N has tighter CI.

  • Can this simply be corrected by median adjust?
  • Is quantification reliable in 126/127? Should I go on analyzing only 128/129/130?