#### Introduction

When it comes to "problem-solving techniques", there are roughly 3 levels:

- Concrete algorithms (Kruskal's algorithm, Li-Chao tree, fast Fourier transform)
- General patterns (dynamic programming, greedy, square-root decomposition)
- Meta-strategies ("how do I even go about solving this problem?")

There is some grey area, but in general this classification works well.

Many tutorials have been written about ideas that fall into 1 or 2. But very little has been written about the third category. On the catalog, I think the only blogs that really qualify are this and this. There is also this valuable comment. As to why there is so little written, I think I can identify two reasons.

- Most strong contestants don't really consciously think about these. After solving a problem with FFT, it's hard not to know you used FFT. If you used DP, even if you did it without thinking that it's DP, it's not hard to recognize that you used DP later anyway. But about things in the third category... it's hard to recall how exactly your thought process went. Furthermore, at least I have never really learned (i.e. read about and understood) anything from that category. Rather my thought process has developed from a lot of practice and experience.
- When writing about algorithms, it is easy to know you are right. You just prove it. But now we are veering into a more uncomfortable realm, where it is harder to understand what is correct or what is even useful. It is not as bad as writing about psychology, but still. Even with this blog, as I'm writing this, I think "does it even make sense?".
- It is not clear if writing blogs of this type actually helps anyone. I still maintain that in your ability to solve problems, the most important thing that you have control over is experience and seeking substitutes for that (algorithms to come up with algorithms, reading other people's thought processes) etc don't work. However there are many people who in theory should have a lot of experience but whose results don't reflect that.

It's far too much for me to write and describe exactly how my thought process works, and I'm not at all convinced that this would be useful for anyone. But I think I've identified something that beginners regularly get wrong, so I'll try my best to explain that. $$$~$$$

#### The idea

##### The forcing fallacy

Many beginners have developed this "technique" for solving problems. They look at the constraints or use some other, usually misguided, intuition to tell whether it is DP, greedy, or something else. They then try to "force" the chosen technique on it. With greedy it is usually easy, but will most likely lead to wrong solutions. With DP it is harder, but they still try.

Many times, after I've explained to someone how to solve some certain problem, they ask "but can it be solved with DP?". There are many reasons one might ask this, but I'm certain that in some cases, it is exactly this. Something in the problem made them believe that this problem must be solved by DP. They tried to solve it directly with DP, trying to figure out states and whatnot, and failed.

**A note about 'can this problem be solved by X'**

Similarly, there are many questions and blogs of the form "is this DP or greedy". For example, there was once a blog asking for help in a problem. One of the comments said "it must be DP because greedy is wrong". There are several things wrong with that statement.

- The obvious, simple greedy strategies you tried were wrong. Nothing is ruling out the possibility that there is a chain of interesting observations giving rise to a different, more involved greedy algorithm.
- Not greedy does not imply DP.
- This is simply the wrong question to ask at this point. "Deciding" whether to use DP or greedy and then forcing your choice is just a bad approach.

There are even more examples. There was a funny blog, where someone tried to feed cheaters wrong solutions, and the cheaters rejected these solutions, because they were sure the correct solution would be to use "monotonic stack". This gives a possible explanation: perhaps something in the problem made them think this problem is about this "monotonic stack" and they tried to force it.

This forcing fallacy also extends to more specific algorithms, not just more general patterns like DP. For example, I once saw some segment tree code that was just complete nonsense. And this is exactly what happened here. The problem was one with range queries — a beginner assumed that it must be solved with a segment tree and tried to force it instead of thinking about the nature of the queries in that particular problem. And it didn't work because the problem probably can't be solved using a segment tree and this beginner was trying to push a square peg into a round hole. What makes this more painful is that the solution to this particular problem was much simpler than a segment tree.

High-rated users can sometimes make it worse. A common phrase you hear after a contest is "oh, E is just dsu + stack" and like... no, it's not. You left out all the reasons about why it works, how to use them and so on. It is not any easier to solve this problem by just knowing that it uses DSU and stack. It is, however much easier to solve it by knowing only the stuff you didn't tell me — I could probably come up with the DSU portion myself. Weirdly, I think we have a tendency to overemphasize the "named" portions of our solutions.

##### What to do instead of forcing

Maybe this technique of guessing and forcing DP or greedy works well enough for easy problems and got you to green, I can't really tell. Here's the thing though. Very few hard problems can be solved **directly** using greedy, DP or any common algorithm. They are used, yes, but **you can't solve most problems by simply deciding to use greedy**.

In most problems, you are presented with some structure (broadly speaking). In a harder problem, you can't directly calculate the thing you are asked to. Instead, what you should do is try to understand this structure. I mean really *understand* it. Not understand the words in the statement, understand what's going on inside. Gather observations. Maybe even forget for the moment what the problem is asking for and try to understand the situation in general. There will be examples below.

**A note about the word 'observation'**

To conclude this section, think about it like this. You're solving an integral.

Do you look at it and decide that "this integral looks like I should do a some additions and multiplications"? That's what this is like. Yes, you do need to add and multiply, but that information does not help you in any way. The essence, the idea of the solution is somewhere else. The same is true in hard CP problems. You might do some greedy things and some DP, but knowing that in advance doesn't help you.

#### Nested Rubber Bands

**Problem.** (1338D - Nested Rubber Bands) Given a tree. You want to draw the tree on the plane in the following way:

- Every vertex is represented by a closed, non-self-intersecting loop.
- Two vertices are adjacent if and only if their loops intersect.

What is the maximum length of a chain of loops in the drawing, where every loop is contained in the previous loop?

Why am I starting a blog (that's presumably for beginners) with a 2700 problem? Because it's a perfect example of what I'm talking about. In this problem, it's so blatantly obvious that you can't solve this problem directly by DP or greedy or some secret algorithm that only reds know about. In this problem, it's not even possible to create a naive solution for stress-testing. The topology-based problem statement stands in the way of everything you might try. There is no other way: you **have** to first understand what it really means to be able to put vertices in a chain like this. You must remove the geometry and topology, you need to find and prove some equivalent condition and you have to rephrase the problem into something that you can actually make an algorithm out of. Then you can do some DP (spoiler!) to solve the new and improved problem.

**The new and improved problem.** Given a tree. Pick two vertices and color some of the vertices in the path between them red. Then you may color some uncolored neighbors of the red vertices blue, in such a way that two adjacent vertices are never blue. What is the maximum number of blue vertices you can make?

*Image stolen from the editorial*

Yes, that problem is equivalent to the nested rubber bands problem! The new problem can be solved using DP. How to exactly come up with this transformation and how to use DP for the new problem are beyond the scope of this blog, but I think this perfectly illustrates the "two-phase" thought process.

- Gather observations, understand what's
*really*going on, convert the problem into something you can make an algorithm out of. - Solve the new and simpler problem directly, with DP, greedy or some classical algorithm.

#### Flipping Range

**Problem.** (1630D - Flipping Range) Given an array $$$a$$$ of length $$$n$$$ and a set $$$B$$$ of integers such that $$$b \le \lfloor \frac{n}{2} \rfloor$$$ for all $$$b \in B$$$. In one operation, you may pick a number $$$b \in B$$$ and change the sign of all numbers in a subarray of $$$a$$$ with length $$$b$$$. You may use this operation as many times as you like. What is the maximum sum of the array $$$a$$$ that you can end up with?

This is the perfect problem where misguided intuition will lead you to think about DP. Maximizing something with operations — that's DP, right? You try to formulate the states and... you're stuck. Because that's not what this problem is. Greedy may also seem enticing — go along the array and make the best flip at the time, right? Nope, that's wrong.

Forget the problem. Understand the situation.

Try to understand this. What are all possible subsets of positions that you can negate? Understand the space of all possible outcomes. Spoiler: once you understand them well enough, the solution to the problem will become trivial.

A nice thing to try is to solve a simpler variant of the problem. Often, a good idea is to think about the simplest non-trivial variant. Instead of having a set $$$B$$$ of operation lengths, let's just have one length, $$$b$$$.

Another good thing to always do is to cast out the irrelevant details. The irrelevant thing here are the values in the array. Right now, we are studying which subsets of the array one can negate. We do not currently care about the values other than that. So forget them, think about only "negated" vs "not negated". A binary choice. 1 for negated, 0 for not negated.

Now that we made these simplifications, let's start solving. The first thing we notice is that we can make the first $$$n - b + 1$$$ values whatever we want to. Simply go from left to right, if the $$$i$$$-th value is already what we want it to be, then leave it as it is. If not, we can negate the subarray starting at $$$i$$$.

But with this strategy, the last $$$b - 1$$$ bits are out of our control. It seems we can't do anything about them without messing up the first $$$n - b + 1$$$ bits. Yet there also seems to be some regularity as to what happens to the last $$$b - 1$$$ bits. I recommend you try this on paper. I'll take $$$n = 11$$$ and $$$b = 5$$$.

Notice anything? There seems to be a cyclic pattern. If we try to set just one bit to 1 on the left, another 1 pops up on the right, or once every 5 times, all the bits on the right are set to 1. Can we understand this better? And can we be sure that it is not possible to end up with a different configuration on the right if we apply the operations in a different way?

A good thing to think about in problems like this are *invariants*. Invariants are things that don't change after you perform an operation. In this case, look at only the positions that are divisible by $$$b$$$. Exactly one bit changes there. Now look at only the positions that give 1 modulo $$$b$$$. Exactly one bit changes there too. And so on.

Let's divide the positions into buckets by their indices modulo $$$b$$$. Every time we do an operation, the number of 1-s in each bucket changes exactly by 1. At any time, the numbers of 1-s in buckets must either be all odd or all even. And that is precisely what we needed! This explains the phenomenon we saw earlier. We can make the first $$$n - b + 1$$$ elements into whatever we want, and the last $$$b - 1$$$ positions are precisely what we need to "correct" the parities. There is one bucket we can't change with the last $$$b - 1$$$ positions, and the rest are needed to change every other bucket into the parity of that bucket.

Now back to the original problem, but still with just one $$$b$$$ rather than a set $$$B$$$. We must choose "odd" or "even", and then choose from each bucket an odd (even) number of elements to negate. Suddenly, the problem is easy! In each bucket you can sort the elements and figure out the prefix to negate if we have to choose an odd (even) number of elements.

**In a loose sense, this is greedy. It is nothing like the simple, obvious greedy we talked about at the beginning. It's not possible to find this greedy unless you go through a thought process similar to above, even if you cut a few corners here and there. The person saying "it's not greedy, therefore it is DP" would never have reached here.**

I'll not delve into the solution with a set $$$B$$$ instead of just one width $$$b$$$, but it is not hard and not the crux of the problem.

**Spoiler**

By the way, this is a very common technique in problems like "you are given operations X and Y, can you make $$$t$$$ starting from $$$s$$$/what is the lexicographically smallest $$$t$$$ you can make from $$$s$$$/how many different $$$t$$$ can you make from $$$s$$$".

- You identify some invariant, something that doesn't change when you apply the operation.
- You prove, using some construction, that anything satisfying this invariant is actually reachable.

Some people think these problems are more creative and interesting than, say, data structure or string problems, but they have their own standard tricks and techniques just like everything else.

#### What does it all mean?

In all problems I have shown, there are essentially two parts to the solution.

- Deeply understanding the problem, familiarizing yourself with the situation, making observations, reductions, rephrasing the problem.
- Actual computation, often in a relatively standard way, possibly a greedy algorithm or DP.

Of course, this picture is simplified. In practice, the steps are very much intertwined, and it is usually hard to write down the two phases so explicitly separately. But the general idea stays the same. Doing greedy or DP happens at the end of your thought process, not at the beginning of it.

Finally, I must say that I myself am guilty of trying to force an algorithm (namely, FFT) on problems and learned pretty quickly that it doesn't work. This is in fact the source of the name -is-this-fft-.

Well, the "Is this FFT" question really goes mad when someone calls every problem with a $$$998\,244\,353$$$ modulo a "FFT Problem". You ain't that mad though!

Indeed, this was a part of it. It was a few years ago before using 998244353 in every problem to "disguise" the modulo became really common. The original "is this fft" problem was this one, as $$$40961 = 2^{13} \cdot 5 + 1$$$.

I think the most well known book dealing with the "meta-strategies" of problem solving is George Polya's How to Solve It.

It's one of the first attempts to try to explicitly state the mental actions happening during problem solving.

As you mention, I'm not sure how useful such resources are. I learned to solve problems before reading this book, and there is no substitute for actually solving problems on your own. Also, the book is perhaps aimed more towards teachers.

Nevertheless, I thought I should mention it since it fits with the general theme of the blog.

[gone]

[gone some more]

Not to be rude or anything, but it's probably because you're cyan yourself and it's weird to make a generalization of this sort, since it sort of implies that this is a prerequisite to hit a rating of 1600, which it most definitely is not (however, it certainly helps in becoming a better problem solver). Also, even if it is true, it is quite dismissive — and not helpful at all — to say "isn't X well known by everyone who satisfies ABC conditions" because it undermines the point of writing something that will inform people who don't satisfy ABC conditions about X.

Let's note the key words/phrases that you have used to justify downvoting my comments.

1. you're a cyan yourself

2. make a generalization

3. quite dismissive

4. undermines

Now, please read my comment again with an open mind. It was a genuine question and none of the things you noted above.

It's very easy to mischaracterize someone's comment and dump all your -ve biases onto it even when all it has done is ask a simple question. But its very telling that the number 1 thing you said was "you're a cyan". Even if my comment is everything you say it is, what does my color have to do with it? Would this comment be okay if I were a red?

I don't want to assume the intentions of people who have downvoted my comment but it could be just downvoting someone because of their color. If it is so, this is nothing new. All this does is discourage lower rated people from voicing there opinion/engaging in the conversation (however right or wrong one might think their opinion/point of view is).

However I don't want to think that -ve. Maybe what I said was wrong, idk. In that case all that was needed was a simple reply saying "no its not well known" or something :) .

If you were a red, it would still have been downvoted, since it comes off as arrogant. No matter how smart a cyan is (I am not questioning your intellect — there are a lot of very smart cyans, some of them having invented widely used data structures that people use in hard problems), anyone who reads such a comment by them could have doubts about their understanding of "a concept X being well-known by people in category Y", since they are not in category Y themselves. If you had mentioned that this is well-known among your > green/cyan friends, it would have been more believable. I guess my comment about you being a cyan could have been misconstrued as being ratist in that sense, but it was all about not being in category Y yourself.

From my understanding of English, "Isn't this XYZ" can be both a genuine question and a rhetorical question. With a comment as short as yours, misunderstandings can happen. Sure, if it was a genuine question, then the answer is no, it is not well known. In fact, I know reds who still have this approach to problem solving.

Regarding the rest of your comment, picking out words without presenting the overall context is very questionable; my intention was to point out that your comment sounded like (to me, whether it was intended or not, and definitely to a lot of other people who downvoted you) saying that you only care about people who are > green/cyan. You might care only about people who are > green/cyan and there's absolutely nothing wrong about that, but that doesn't hold true for everyone else, and there exist people (like the author) who want to help out those people. If interpreted like this, your comment is quite discouraging for people who actually want to help out greens/cyans. There are really not many high-quality blogs like these that tell you what kind of mindset could help you while solving problems which is something a lot of people don't realize without experience, so it's just a loss for the community. In fact, your comment also sounds like it says that if you are > green/cyan, there is no point of reading the blog. If you read the blog, you can see that the author says it is intended for beginners (in the sense of how well they understand their own thought process and how matured their thought process is), and I know for a fact that beginners can be blue/purple/yellow/red too.

Fun fact: I haven't downvoted your comment at all.

I just downvoted your comment. FAQ What does this mean?

The amount of karma (points) on your comment and Reddit account has decreased by one. Why did you do this?

There are several reasons I may deem a comment to be unworthy of positive or neutral karma. These include, but are not limited to:

Am I banned from the Reddit?

No — not yet. But you should refrain from making comments like this in the future. Otherwise I will be forced to issue an additional downvote, which may put your commenting and posting privileges in jeopardy. I don't believe my comment deserved a downvote. Can you un-downvote it?

Sure, mistakes happen. But only in exceedingly rare circumstances will I undo a downvote. If you would like to issue an appeal, shoot me a private message explaining what I got wrong. I tend to respond to Reddit PMs within several minutes. Do note, however, that over 99.9% of downvote appeals are rejected, and yours is likely no exception. How can I prevent this from happening in the future?

Accept the downvote and move on. But learn from this mistake: your behavior will not be tolerated on Reddit.com. I will continue to issue downvotes until you improve your conduct. Remember: Reddit is privilege, not a right.

I just downvoted your comment. FAQ What does this mean?

The amount of karma (points) on your comment and Reddit account has decreased by one. Why did you do this?

There are several reasons I may deem a comment to be unworthy of positive or neutral karma. These include, but are not limited to:

Rudeness towards other Redditors,

Spreading incorrect information,

Sarcasm not correctly flagged with a /s. Am I banned from the Reddit?

No — not yet. But you should refrain from making comments like this in the future. Otherwise I will be forced to issue an additional downvote, which may put your commenting and posting privileges in jeopardy. I don't believe my comment deserved a downvote. Can you un-downvote it?

Sure, mistakes happen. But only in exceedingly rare circumstances will I undo a downvote. If you would like to issue an appeal, shoot me a private message explaining what I got wrong. I tend to respond to Reddit PMs within several minutes. Do note, however, that over 99.9% of downvote appeals are rejected, and yours is likely no exception. How can I prevent this from happening in the future?

Accept the downvote and move on. But learn from this mistake: your behavior will not be tolerated on Reddit.com. I will continue to issue downvotes until you improve your conduct. Remember: Reddit is privilege, not a right.

How would you avoid getting trapped by observations/reductions that seem useful but aren't actually relevant to the solution? I know that has happened to me quite a few times.

Yes, it is a well-known problem — getting attached to a certain "path" because it seems useful. Unfortunately I can't suggest anything better than to be aware of it and perhaps sometimes to consciously force yourself to think of something else.

I am not a great problem solver at all, but for my experience, you just can't.

Not like "you will always get trapped by observations", but more of "you will get trapped from time to time", and I don't think you can avoid it, because you literally can't know, if observations you made are even useful, either because you have some kind of a brain fog, and you can't evaluate relevancy of your find, or you haven't gone through enough problems, which let you determine more accurately if "observation X is useful for problem Y".

I haven't found the exact ratio when you explore the problem vs when you exploit everything you've found, and for every problem this ratio is probably very unique. The only relieving thing is that after you experience these ratios through practice, you will (probably) start to feel them, therefore you don't get trapped as often.

As a rule of thumb, in practice(because contests are a whole another animal) I would spent X minutes on working with stuff I've got before trying to look at the problem from a different angle, where X is a number that you're comfortable with, so you should find it by trial and error, since concrete numbers probably won't work for 2 different humans. Also don't make X too high, because in practice(not in contests) exploring a problem is not punishing, and might give you more insights.

FYI, This is a very famous phenomenon called Einstellung effect.

A previous blog post about it, https://codeforces.com/blog/entry/99578.

I just want to create a discussion and maybe change my point of view, so I will write this argument:

Isn't sometimes forcing a necessary part of learning to be a problem solver? I, myself, am not very experienced in tehniques and etc., and for some problems, forcing mught be the only way to find the solution to that problem (although that usually means the problem isn't so great). For example, at least in problems that would use divide and conquer or sqrt decomposition, I wouldn't know at first

whywould I be doing this. But then, as I continue forcing this ideology, I start to observe things that are conditions to spotting those tehniques in the first place-- conditions that wouldn't be native to some other approach (say, Segment Tree or Binary Lifting).This is all to say, am I really doing it

thatwrong?Maybe it is not that bad to learn that specific thing.

In my opinion, it is important to decouple the various levels of problem-solving techniques, as the author of this blog did in the first paragraph.

Forcing is important in the sense that if you have no problem-solving experience at all with a certain high-level idea, it is much better to look at concrete examples until you've internalized the high-level idea (as opposed to just learning "tricks" — think of this as generalization vs overfitting in machine learning).

I believe that it's very important to get a hang of high-level ideas in order to become a strong competitor (or have insane processing power to go through every low-level idea you can solve). It is helpful to treat your working memory as having a multi-level cache — your "L1" cache being high-level ideas, "L2" cache being patterns, and "L3" cache being concrete algorithms. The stronger you get at problem-solving, the less time your L1 cache fetches take (and the less noticeable they are) before you solve the problem. It is also useful to think of it like this: it is much easier to take smaller steps than directly jumping on to a large set of possibilities, and it is also a much more systematic way of approaching things.

In that sense, "forcing" to learn algorithms and why they work is the lowest level, higher on the list is patterns (like — is your structure a monoid, and if yes, is a segment tree good enough?) and the highest level is being able to think in an ad-hoc manner that is decoupled from trying to apply certain ideas/tricks.

Of course, most people don't do this systematically, which is also why learning is so hard to understand.

How do you map this analogy to purely "constructive" problems as in for example this Div.2 A or this Div.2 B I cannot think of a high level idea to map these kinds of problems to. Is there something I am missing?

Any help would be appreciated, thanks for the comment :)

Firstly I'd point you to this blog that I wrote to explain my understanding of how learning takes place; maybe that'll help you more than the brief comment to which you replied.

In the context of the first problem, for example, there is a high level idea that is seen a lot: "look at the final state." Binary searching for the answer is an example of this idea, if you want specific examples. Let's say there are $$$k$$$ liars in the group. What happens now? The number of people who say there are at least $$$x$$$ liars for $$$x \le k$$$ are saying the truth, and all people with $$$x > k$$$ are actually liars. To solve the original problem, you can just brute-force on the answer, and be done with the problem.

However, it is more instructive to explore this problem further.

What if you wanted to solve the problem with better complexity? There are two ways to proceed (the first leads to an $$$O(n \log n)$$$ solution and the second leads to an $$$O(n \log n)$$$ solution which can be made to run in $$$O(n)$$$ if you use counting sort):

Another high level idea helps: "represent things visually." Here we'll plot the number of liars "computed" from the statements minus the number of liars we assumed, versus the assumed number of liars. This is a non-increasing graph, so you can in fact do a binary search with the predicate: "is the number of liars deduced at most the number of liars assumed?" Remember to check that the answer from this predicate actually works, or just binary search for $$$0$$$ on the array of values of this function.

The other way is make this brute-force more efficient (this is a high-level idea). Let's consider the structure of the problem a bit more. As earlier, note that if there are $$$k$$$ liars, then any person $$$i$$$ with claim $$$x_i$$$ is lying iff $$$x > k$$$. So if we sort (another high level idea) the number of people by their claims, a certain suffix of this sequence is the set of the liars. Similarly, a certain prefix of this sequence is the set of truthful people. How do we now identify this partition (note that doing this is equivalent to solving the problem)? If the number of liars is $$$k$$$, then the 1-based index of the first liar is $$$n - k + 1$$$ (if they exist), and the index of the last truthful person is $$$n - k$$$ (if they exist). Can we solely decide the number of liars based on the indices and the claims, in a single iteration over the array? The answer is yes, and we do so like this:

As far as the second problem is concerned, it is just manipulation of equations (the high level idea involved is "rephrasing"). So basically you get $$$a_i \text{ mod } x = a_{n-i+1} \text{ mod } x$$$, which is the same as saying $$$x$$$ divides $$$a_{n-i+1} - a_i$$$. Since this happens for all $$$i$$$, this means $$$x$$$ divides $$$a_{n-i+1} - a_i$$$ for all $$$i$$$, which, upon rephrasing, means that it is one of their common divisors. The problem just reduces to finding the greatest common divisor of these numbers, which should be fairly straightforward.

Very good blog. I think that really understanding the problem is often the part that does not get the deserved attention. When writing any kinds of editorials/official solutions one may often go directly into the specific arguments, but I prefer explaining not only

what I did, but rather alsohow I got there, it seems to me it is more educational approach even if for some it may appear to be redundant and unnecessaryEspecially the part "High-rated users can sometimes make it worse. A common phrase you hear after a contest is "oh, E is just dsu + stack" and like... no, it's not. (...)" seems very relatable, I hate when I see this :P

From my experience, knowing your own thought process is extremely beneficial. To improve rating by 200~300, you just need to practice more, but to improve 500+ there has to be fundamental changes to one's thinking process. For example, the improvement on "recursive thinking" would be very helpful in tree DP problems, and some people just master it without actually noticing this, but some others need to be aware of it before they become proficient in this thinking manner.

how to be cm

I'd say emphasize on observations. Summarize common patterns from time to time, try to find similarities between different problems. Solve some exercises in discrete mathematics, do more proofs. Learn how to think differently, and simplify your thinking. Try to make the thinking systematic. And most importantly, have fun in solving problems, be free of stress, have good sleep.

As the post mentioned,

But about things in the third category... it's hard to recall how exactly your thought process went.Yeah, knowing your own thought process is extremely beneficial. But on the other hand, it's also extremely hard (and impractical?) to notice the true way you think

the most of the times. You can simply find lots of "patterns" after you come up with a problem, but the core concept, the general idea of the problem that can be applied to anywhere, usually is still as much hard as it would be to find.As a reference, I learned the process and the proof of Kruskal algorithm for MSTs long time ago, but only after 3 years did I first realize what the "greedy" approach is actually doing.

So, instead of forcing yourself to think about the thinking process every problem you solved, maybe the better way is just devote yourself in practicing more.

Of course, if you have a sudden sparkle after you solved something, don't miss it and give it a review.

I'd say it's hard but not impractical. Surprisingly, the people (me included) who went through psychological therapies (such as CBT) would have a better sensitivity over their minds. This helps me identify and classify different components of my thinking process. BUT, most people are not fond of this type of training, and I respect. Just to mention the possibility.