2020-07-08

Interview asked Question, how many threads are suitable to create?

What scenarios does concurrent programming apply to?
If the reason why you choose multi-threading is a "fast" word, there will not be so many moths in the interview. Have you ever asked yourself

Is concurrent programming fast in all scenarios?
Knowing it is fast, why is it fast? How to measure?
To know the answers to these two questions, we need an analysis process from [qualitative] to [quantitative]

Using multi-threading is to maximize the running speed of the program by setting the correct number of threads in the correct scene (I feel that you haven't said anything).

Translating this sentence to the hardware level is to make full use of CPU and I/O utilization.


The correctness of the two is guaranteed, and the purpose of maximizing the utilization of CPU and I/O can be achieved. The key is, how to do two [correct]?

When talking about specific scenes, we must show our professionalism. Give you two noun buff bonuses.


  • CPU-intensive programs
  • I/O-intensive programs



CPU-intensive programs
A complete request, I/O operations can be completed in a short time, the CPU still has a lot of calculations to process, which means that the CPU calculations account for a large proportion

If we want to calculate the sum of 1+2+...10 billion, it is obvious that this is a CPU-intensive program

Under [single core] CPU, if we create 4 threads to calculate in stages, namely:

Thread 1 calculation [1,25 Billion)

... and so on

Thread 4 calculation [75 Billion,100 Billion]

Since it is a single-core CPU, all threads are waiting for CPU time slices. According to the ideal situation, the sum of the execution time of four threads is equal to that of one thread 5 alone. In fact, we also ignore the overhead of context switching of four threads.

Therefore, the single-core CPU handles CPU-intensive programs, which is not suitable for multi-threading.

At this time, if a 4-core CPU is used, four threads are also created to calculate in stages. What happens?

Each thread has a CPU to run, and there is no waiting for CPU time slices, and there is no overhead of thread switching. In theory, the efficiency has increased by 4 times

Therefore, if it is a multi-core CPU to process CPU-intensive programs, we can fully utilize the number of CPU cores and apply concurrent programming to improve efficiency

I/O-intensive programs
Compared with CPU-intensive programs, a complete request, after the completion of the CPU operation, there are many I/O operations to be done, that is to say, a large proportion of I/O operations

We all know that when performing I/O operations, the CPU is idle, so we want to make the most of the CPU and not let it be idle.

Also in the case of a single-core CPU:
each thread executes the same length of CPU time and I/O time. If you draw the above figure for a few more cycles, the CPU operation time is fixed and the I/O operation Time-consuming becomes 3 times the CPU time-consuming, you will find that the CPU is idle again, then you can create a new thread 4, to continue to maximize the use of CPU.

In summary, we can make such a summary:

The higher the percentage of thread waiting time, the more threads are required; the higher the percentage of thread CPU time, the fewer threads are required.

At this point, I believe you already know the first [correct] scenario of using multithreading, how many threads are created correctly?

How many threads are appropriate to create?
If you ask this question during an interview, this is a comprehensive test of your theory and practice. To get it right, you must have [proficient/proficient/proficient] primary school arithmetic

From the above, we know that we have two scenarios that are CPU-intensive and I/O-intensive. Of course, the number of threads required in different scenarios is different.

How many threads are appropriate for CPU-intensive programs?
Some students have already found that, for CPU-intensive, it theoretically Number of threads = CPU Core number (logic)can be, but in fact, the number usually set CPU Number of cores (logical)+ 1, and why?

"Java concurrent programming actual combat" said:

Computing (CPU)-intensive threads happen to be suspended at some point because of a page fault or other reasons. There is just an "extra" thread that can ensure that CPU cycles will not interrupt work in this case.

So for CPU-intensive programs, the CPU Number of cores (logical)+ 1number of threads is the reason for the better experience value

How many threads are suitable for I/O-intensive programs?
The above has let everyone draw a few more cycles according to the picture (you can manually increase the ratio of I/O time to CPU time, such as 6 times or 7 times), so you will get a conclusion, for I/O intensive Type program:

Optimal number of threads (1/CPU Utilization)==1 + (I/O time consuming /CPU time consuming)

I am so considerate, and of course I am worried that some students do not understand this formula. We will manually bring the proportion of the above figure into the above formula:

This is the optimal number of threads for a CPU core. If there are multiple cores, the optimal number of threads for I/O-intensive programs is:

Optimal number of threads = CPU Number of cores* (1/CPU Utilization)= CPU Number of cores*(1 + (I/O time consuming /CPU time consuming))

Speaking of which, some students may have doubts. To calculate I/O-intensive programs, you need to know the CPU utilization rate. If I don't know these, how can I give an initial value?

According to the above formula, if almost all I/O is time-consuming, you can say that it is 2N (N=CPU core number) in pure theory , and of course 2N + 1 , (I guess this 1 is also a backup), not found The specific overthrow process is shown in the screenshot of [Concurrent Programming Combat-Chapter 8.2]. If you are interested, you can take a look at it yourself



In theory, in theory, in theory , this can achieve 100% CPU utilization

If the theory is easy to use, then there is no need to practice, and there will be no tuning. But in the initial stage, we can indeed use this theory as a pseudo-standard, after all, the difference may not be too much, so the tuning will be better

After talking about the theory, let's talk about the actual. I understand the formula (the end of the qualitative phase), but I have two questions:

How do I know the specific I/O time and CPU time?
How to check CPU utilization?
Yes, we need quantitative analysis

Fortunately, we are not the first to eat crabs. In fact, there are many APM (Application Performance Manager) tools that can help us get accurate data. If we learn to use these tools, we can combine theory and tune in. The process gets a better number of threads. I will briefly list a few here, which one to use, and the specific application requires you to research and choose yourself. Due to space limitations, I will not discuss it for the time being.


  • SkyWalking
  • CAT
  • zipkin

Having learned the basic theoretical knowledge above, what might the interview ask? How might they ask questions?

Interview Questions

Ask a question:
Suppose that the TPS (Transaction Per Second or Task Per Second) of a system is required to be at least 20, and then assume that each Transaction is completed by a thread, and continue to assume that the average time for each thread to process a Transaction is 4s

How to design the number of threads so that 20 transactions can be processed in 1s?

However, this is because the number of CPUs is not taken into consideration. There is no mine at home. Generally, the CPU core number of the server is 16 or 32. If there are 80 threads, it will definitely bring too much unnecessary thread context switching overhead (hope you can say this actively), this is Need to be tuned to achieve the best balance

Question

The calculation operation requires 5ms, and the DB operation requires 100ms. For an 8-CPU server, how to set the number of threads?

If you don’t know, please take the third grade final exam questions and do it again (tonight’s self-study stay).

Number of threads = 8 * (1 + 100/5) = 168 (number)

So if the DB QPS (Query Per Second) upper limit is 1000, how much should this thread number be set at this time?

Again, this does not take into account the number of CPUs, and then the next stage of fine tuning

Because a request not only includes CPU and I/O operations, the specific tuning process also needs to consider specific contents such as memory resources, network, etc.

Will increasing the number of CPU cores solve the problem?
Seeing this, some students may think that even if I calculated the theoretical number of threads, but the actual number of CPU cores is not enough, it will bring the overhead of thread context switching, so the next step is to increase the number of CPU cores, then we blindly increase the CPU Will the core solve the problem?

Speaking of the content of the mutex, I intentionally left a knowledge:

How to understand this formula?

This conclusion tells us that if our serial rate is 5%, then no matter what technology we use, the maximum performance can only be increased by 20 times.

How to understand the serial percentage simply and roughly (in fact, you can get this result through the tool)? Let's look at a little Tips:

Tips: Critical sections are all serial, non-critical sections are all parallel, the time to execute the critical section with a single thread / the time to execute with a single thread (critical section + non-critical section) is the serial percentage

Now you should understand what I said when explaining the synchronized keyword:

Minimize the range of the critical section, because the size of the critical section is often the bottleneck problem, do not use the pan like try catch

to sum up
Multi-threading is not necessarily more efficient than single-threading, such as the famous Redis (which will be analyzed later), because it is based on memory operations. In this case, single-threading can use the CPU very efficiently. While the multi-threaded use scenario generally has a considerable proportion of I/O or network operations

In addition, we have learned how to analyze qualitatively to quantitatively in combination with elementary school mathematics. Before starting with no data, we can use the empirical value mentioned above as a pseudo-standard, followed by step by step combined with reality Tuning (integrated CPU, memory, hard disk read and write speed, network status, etc.)

Finally, blindly increasing the number of CPU cores may not necessarily solve our problem, which requires us to strictly write concurrent program code

Soul question
We already know how many threads are suitable for creation. Why do we need to create a thread pool?
What do you need to do to create a thread? Why is frequent thread creation expensive?
Multithreading usually pays attention to the problem of shared variables. Why is there no thread safety problem for local variables?

No comments:

Post a Comment