Even if say, the effect of background prayer always tended towards the same direction, then that would still confound the results of the STEP study, since instead of comparing people with 0 (control) and 1 prayers, we would be comparing people with say, 7 and 8 prayers. That would be sufficient to significantly decrease the statistical power of the study. You actually want background prayers to work in randomized directions such that they can cancel out. But again, I don’t know who will seriously defend the idea that the effect of uncontrolled background prayers in general must be randomized. The general large-scale behavior of humans may contain sufficient randomness to “average out” and thus amenable to study using social science methods, but are we suggesting that God’s behavior is? (Even in the case of regular studies, today we are increasingly realizing that many social scientific studies turned out to be skewed by the demographics of their most common participants.)
You are absolutely right that no study can control for every possible variable. But a good study should be able to control for the variables that matter for its level of statistical significance. Otherwise it’s a worthless study. Simply throwing more statistics (more people, more prayers, etc.) at a study which has massive potential for systematic error is useless.
Second, the difficulty of designing powerful social science studies in general is also why as I wouldn’t hang my hat on such studies in general. When less than half of experimental papers in psychology can’t be replicated, do we really want to use a set of poorly controlled studies as a basis to tell people that their prayers are useless?