This is a simple approach using 'colllect_set' function and some pythonic operations:
idLimit=3 #define your limitid_lst=(sourceDF #collect a list of distinct ids .select(collect_set('id')) .collect()[0][0] )id_lst.sort() #sort the ids alphabaticallyid_lst_limited=id_lst[:idLimit] #limit the list as per your defined limittargetDF=(sourceDF #filter the source df using your limited list .filter("id in ({0})".format(str(id_lst_limited)[1:-1])) )