Thread Pool & Process & Thread…
This article based on the questions I encounter often in interviews.
Today’s topic is one of those that In my opinion each developer (especially Backend) should know.
These concepts are general concepts that do not depend much on the programming language, as they are related to the working principles of the CPU, and I will explain some concepts theoretically.
Topics I will cover in this article:
What is Thread? What types are there? What is Urgent Thread? What are the differences?
What are Process and Child Processes? What are the differences with Thread?
What does Multi-Thread, Single-Thread process mean?
What is a thread pool? How is it managed?
What is the thread pool limit and is it possible to change it?
How does Thread Secheduler work?
In this article, I will touch on topics such as Process Control Block and Processes table and so on.
Lets start…
Thread & Process
Thread is one of the concepts that allows us to do more than one thing at the same time.
Threads are divided into Kernel Level and User Level threads, and regardless of the type, each thread can do only one job at the same time.
A process is a program waiting to be run. In other words, we can say a collection of threads.
Any program (VS Code, Anydesc…) is a program as long as it is not working, if we run it, a corresponding process will be created.
At least 1 process must be running for each program that is launched. This includes having the operating system running.
When a process is started, certain resources are allocated to it, and threads within it use these resources.
Each Process can be divided into other processes. Such processes are called Child Processes .
Creation of processes is implemented with Tree Structure , that is, each Child process can create its own Child processes.
1 program can run in more than 1 process. The advantage of this is that the process ends quickly, but the disadvantage is that it takes up a lot of RAM and overloads the CPU.
Process is managed by a data structure called PCB(Process Control Block) .
All information related to the process is stored in the Process table through the PCB .
If only 1 thread is working within the process, that process is Single-Thread .
.
If more than 1 Thread works, it is called a Multi-Thread process. ( async I/O )
For example, if we want to see 30 tasks within 1 process at the same time, we are gonna need 30 threads.
One of the main concepts is the Cluster, which is a way to create ( fork ) Child processes to optimize performance, but since it is a very long topic, I will save it for the next article. For those who just don’t know, it is enough to know that there is such a important thing.
Difference between Process and Thread
Process is isolated but Threads share resources(data, code, files,memory).
Processes work in complete isolation from each other, that is, they use different resources, but Threads share the same resources .
If one process is blocked, then no other process can execute.
If 1 Process is blocked, the others will not work until it is unblocked, but
if 1 Thread is blocked, the other one can do its work.
Changes to the parent process does not affect child processes.
A change in the Parent process does not affect the Child processes, but a change in the Main Thread affects other threads.
Process takes more time for creation unlike Thread.
Thread creation and termination takes much less time than Process.
Difference between User Thread(UT)and Kernel Thread(KT)
Recognizing by OS
Only recognized by “ KT” Operating System.
Implemention
“UT” is implemented by the User and “ KT” is implemented by the OS .
Hardware support
It just needs “ KT” -in Hardware support.
Difficulty implementing
“UT” implementation is simple and “ KT “ implementation is very complex.
Its purpose is to keep the number of threads in the system under control and wait for new jobs instead of terminating threads that have finished their work.
When a Thread pool is created, a specific number of Threads are created and wait until suitable work arrives.
For example, Thread pool in NodeJS is automatically created by Libuv and the default max Thread capacity of “ Libuv thread pool ” is 4.
To manage the thread pool we need Queue which uses two main methods deQueue and enQueue .
Let’s try to understand these concepts along with Multithreading .
I will use the pbkdf2() method of the crypto library since it uses a Thread Pool in the background .
What happened?
The default size of “ libuv thread pool “ is 4, but in the example there are 5 tasks.
In this case, a thread corresponding to each task will work.
And these threads will be shared between cores by OS Thread Scheduler
The main purpose of OS Thread Scheduler is to manage which thread is executed and when.
For example, threads that perform tasks such as Mouse movement and Keyboard use are considered Urgent Threads .
The Scheduler knows that those Urgent Threads should not wait long and tries to execute them first.
Without digressing, let’s return to the above issue:
Since we have 4 default threads, the first 4 tasks will start at the same time.
Since there is no empty thread suitable for the other 5th task, whichever of the threads finishes its work quickly will execute the 5th task .
Therefore, the first 4 will be written to the console at the same time (2 seconds), then the 5th (+1 second).
How can we change the size of the thread pool?
For this we will use the following env variable:
process.env.UV_THREADPOOL_SIZE = 5 (mac,linux)
but I don’t think it works on Windows, so I’ll use it as a run script.
"scripts": {
"start": "set UV_THREADPOOL_SIZE=2 & node threads.js"
},
If we give size=1 one by one, and if we give size=2, the results will be written to the console in a 2–2–1 format.
Note: Standard NodeJS libraries like “fs” and “ crypto” use thread pool in the background.
Note: Although there are 4 threads in the Default Thread pool, in the following case, all 6 tasks will start at the same time.
That is, all the results will be written to the console at the same time. Because all the processes related to the network are async.
Such processes run outside the Thread Pool and are managed directly by the OS .
Finally, let’s look at the current state of the cpu with the task manager:
I have added usefull links to Titles for more information.
I hope it was helpful. Thank you for your attention.