The purpose of this study was to analyze the impact of the propensity score method on the treatment effect estimates, when the treatment effects were estimated using propensity scores for multievel data structure. This simulation study was conducted for treatments occurring at the school level (level 2) to analyze the quality of treatment effect estimations for the three implementation of propensity score methods(two-stage matching/weighting combination method, inverse probability of treatment weighting method, and matching method). The ratios of treatment to control group members (1:3, 1:1), the number of groups (50, 100), and the average group sizes (20, 40, 50) were considered in the simulation conditions. The findings were as follows. Across all conditions, the standardized mean difference was relatively smaller in two-stage matching/weighting combination method, compared to the inverse probability of treatment weighting method and matching method which have been more popularly used.
Further, the mean squared error was smaller when the two-stage matching/weighting combination method was used rather than the two other methods was. As for the relative standard error results, the two-stage matching/weighting combination method also had a higher efficiency than the inverse probability of treatment weighting method and matching method.
Across all conditions, the selection of two-stage matching/weighting combination method in multilevel data structure was supported.