This part is written with less care because probably it's useless, but if you'll try to understand them, sorry for the ambigious use of language
First you can think the distance condition as choosing a center vertex for each set, such that distance between any vertex in their respective sets to these centers are $$$\le K$$$
In the contest I thinked about calculating answers for subtrees then merging them because of the small K, but I wasn't able to get anywhere, mainly because order you merge the subtrees affects the answer. I had two ides to bound the distance between two vertices, one was to check the vertices with maximum distances, and one was choosing a center vertex for a set, then checking distances from other vertices while adding to this set. In both cases, for a certain subtree you have $$$O(K)$$$ options, and it's easy to see that, it is unoptimal to maintain more than one set for merging, and again you don't need to maintain a partition of a subtree with more than minimum number of subsets needed. Unfortunetely, for both cases, subtrees could merge arbitrarly and I wasn't able to see anything to make it easier to calculate so these options were dead ends for me. (About the clowning)
Other way to think that felt natural to me was to think it as removing vertices then solving the problem for the forest like structure that is left. When you ignore the limit of $$$S$$$, a DP solution with complexity $$$O(NK)$$$ becomes very obvious, but there exists another solution using distances. As there is no limit for $$$S$$$, you can think the problem like partitioning leaves, without having any trouble, and when tree is stretched over it's diameter, you can see that there's always a vertex that is optimal to choose as a center of a set, which is on a diameter and has the exact distance $$$K$$$ to one of the vertices at the end of that diameter. Proving it is not that hard and after that point you can just remove all the vertices that can be in the group of this vertex, hence solving the problem recursively. But luckly problem is harder than that and we have a size condition for each set.
At this point, I thinked about something similar to calculating min cost max flow on a bipartate graph, and optimizing it with the diameter idea mentioned above. I guess it should lead to some greedy algorithm from here which I had quite a few, I was able to disprove couple of them but wasn't able to prove none. Thinking about it now, maybe it's not a bad idea to try proving one of them with something like Hall's Theorem. But as you can guess, this led to a dead end for me, for now; but I hope a cool solution can be derived from the idea here.