We propose a computational architecture of human joint action that accounts for interactions between higher- and lower-level coordination processes. A proof-of-concept implementation of the architecture is used to model the social Simon task, a well known experimental task that reveals an interplay between higher- and lower-level processes. We show that our model is able to generate results aligned with human performance data for four task configurations. This work contributes to an understanding of mechanisms involved in joint actions.