Which patients to sample in clinical cohort studies when the number of events is high and measurement of additional markers is constrained by limited resources
Purpose: We consider an existing clinical cohort with events but limited resources for the investigation of a further potentially expensive marker. Biological material of the patients is stored in a biobank, but only a limited number of samples can be analyzed with respect to the marker. The question arises as to which patients to sample, if the number of events preclude standard sampling designs. Methods: Modifications of the nested case-control and the case-cohort design for the proportional hazards model are applied, that allow efficient sampling in situations where standard nested case-control and case-cohort are not feasible. These sampling designs are compared to simple random sampling and extreme group sampling, the latter including only patients with extreme outcomes, ie either with an event early in time or without an event until at least a point later in time. Results: The modified nested case-control design and the modified case-cohort design provide powerful methods for sampling in a clinical cohort with many events. The simple random sampling usually is less efficient. If focus is on precise estimation of a potential effect in terms of a hazard ratio, extreme group sampling is not competitive. If focus is on screening for important biomarkers, extreme group sampling markedly outperforms the other sampling designs. Conclusions: When it is not feasible to sample all events, a modified nested case-control design or case-cohort design leads to efficient effect estimates in the proportional hazards model. If screening for important biomarkers is the primary objective, extreme group sampling is preferable.