Accurate measurement requires maximising the correlation between true scores and measured scores. Classical psychometric concepts such as construct validity and reliability are often difficult to apply in experimental contexts. To overcome this challenge, calibration has recently been suggested as generic framework for experimental research. In this approach, a calibration experiment is performed to impact the latent attribute in question. The a priori intended true scores can then serve as criterion, and their correlation with measured scores, termed retrodictive validity, is used to evaluate a measurement method. It has been shown that under plausible assumptions, increasing retrodictive validity is guaranteed to increase measurement accuracy. Since calibration experiments will be performed in finite samples, it is desirable to design them in a way that minimises the sample variance of retrodictive validity estimators. This is the topic of the current presentation. For arbitrary distributions of true and measured scores, we analytically derive the asymptotic variance of the sample estimator of retrodictive validity. We analyse qualitatively how different distribution features impact on estimator variance. Then, we numerically simulate asymptotic and finite-sample estimator variance for various distributions with combinations of feature values. We find that it is preferable to use uniformly distributed (if possible discrete) experimental treatments in calibration experiments. Secondly, inverse sigmoid systematic aberration has a large impact on estimator variance. Finally, reducing imprecision aberration decreases estimator variance in many but not all scenarios. From these findings, we derive recommendations for the design and for resource investment in calibration experiments.