In audiotactile dynamic capture, judgment of the direction of an apparent motion stream (such as auditory motion) was impeded (hence ‘captured’) by the presentation of a concurrent, but directionally opposite apparent motion stream (such as tactile motion) from a distractor modality, leading to a cross-modal dynamic capture (CDC) effect. That is to say, the percentage of correct reporting of the direction of the target motion was reduced. Previous studies have revealed the effect of stimulus onset asynchronies (SOAs) and the potential spatial remapping (by adopting a cross-hands posture) in CDC. However, further exploration of the dynamic capture process under different postures was not available due to the fact that only two levels of time asynchronies were employed (either synchronous or with an SOA of 500 ms). This study introduced a broad range of SOAs (−400 ms to 400 ms, tactile stream preceded auditory stream or vice versa) to explore the time course of audio-tactile interaction in CDC with two spatial references — arms-uncrossed or arms-crossed postures. Participants judged the direction of auditory apparent motion with tactile distractors. The results showed that in the arms-uncrossed condition, the CDC effect was prominent when the auditory–tactile events were in the temporal integration window (0–60 ms). However, with a preceding tactile cueing effect of SOA equal to and above 150 ms, the CDC effect was reduced, and no CDC effect was observed with the arms-crossed posture. These results suggest CDC effect is modulated by both cross-modal interaction and the spatial reference (especially for the distractors). The magnitude of the CDC effects in audiotactile interaction may be accounted for by reliability of tactile spatial-temporal information.