This issue has been migrated from #3791.
here: if we decide that the new membership event is a duplicate, we throw it away; however, by that point we have already created a state group for it.
These state groups show up in the database with entries in state_groups and state_groups_state, but the event referred to in state_groups is not present in events or anywhere else.
I think this is probably responsible for quite a lot of the state group blowup on IRC-bridged rooms on matrix.org.