From the comments in https://codeforces.com/blog/entry/67683, some questions, especially the last question in div3 contests are obviously overestimated, say 1176F. Also the difficultly for old problem sets may not fit current situation. Therefore is it possible to have a system allowing users to "recommend" the true difficulty of a problem, instead of using the current ELO rating system to esitmate (Ref)

Draft idea:

A user is allowed in the system if he is experienced with CF, say participated in 10+ contests, solving 100+ problems, and having max rating>1900

User can vote on a particular problem if they do not join the contest and they solved it. For user participated in the contest, (1) their performance during contest has been considered in the estimated rating, and (2) they may read hints and tutorials after the contest and think the question is not difficult

User can vote on significantly over-estimated (should be at least 200 lower) / under-estimated / close to expected difficulty shown. If there is a strong tendency, update the estimated rating by a little step (50/100). Then the statistics will be cleared and people who never vote on this question can vote. Updates can be applied many times. For better visualization we may apply 50-interval instead of 100.

How do you think on the improving the estimated difficulty?

Well, the severity of problems can not be determined precisely and for everyone it will be subjective. But to make a personal evaluation for each user will be a clandestine task, and somewhat useless. Therefore, the difficulty of problems can be determined, in proportion to other problems, or the complexity of the algorithm, which it solves.However, in general, this topic needs to be overwhelmed and discussed.

However, in any case, if a person wants to learn to solve problems, then she will find optimal for himself)I think the easiest solution is to calculate rating estimation of problems like 1176F (or all Div3 problems) with including not only Div3 participants, but participants from Div1 and Div2.

It can be seen that this calibration will succeed, from the fact which many oranges and reds are solving 1176F.

This applies to not-participants as well.

I think it would be better and easier to recalibrate formulas to make same-level problem get the same the same rating in various contests. This requires experimenting with the formula, I guess.

Yep, I have plans to adjust formulas a little to fix the issue.