LLMs for Embedded Multi-Threaded
When deploying SS-LLMs in multi-threaded environments, there are several core concepts of multi-threading to keep in mind to ensure applications run correctly
1. Race Conditions
- What it is: Two or more threads access shared data at the same time, and the result depends on the order of execution.
- Example: Two threads incrementing a shared counter without synchronization:
-
# Thread 1 and Thread 2 both run this:
-
counter = counter + 1
- Without locking, the final value may be incorrect because both threads might read the same value before writing.
2. Synchronization / Mutual Exclusion
What it is: Mechanisms (locks, mutexes, semaphores, synchronized blocks) that ensure only one thread accesses a resource at a time.
Example (Python threading with a lock):
import threading
lock = threading.Lock()
def safe_increment():
global counter
with lock:
counter += 1
3. Deadlocks
- What it is: When two or more threads wait indefinitely for resources locked by each other.
- Example:
- Thread A holds Lock1, waits for Lock2.
- Thread B holds Lock2, waits for Lock1.
Both block forever. - Mitigation: Use a consistent lock acquisition order, or try-lock with timeouts.
4. Starvation & Fairness
- What it is: A thread never gets CPU or resources because others keep acquiring them.
- Example: High-priority threads continually acquire a lock, starving a low-priority thread.
- Mitigation: Use fair locks, balanced thread pools, or scheduling strategies.
5. Thread Safety
- What it is: Code that behaves correctly when executed by multiple threads at the same time.
- Example: Java’s
StringBuffer
is thread-safe;StringBuilder
is not. - Mitigation: Use thread-safe collections or immutable objects when possible.
6. Concurrency vs. Parallelism
- Concurrency: Structuring a program to handle multiple tasks at once (e.g., overlapping I/O).
- Parallelism: Actually running tasks simultaneously on multiple cores.
- Example:
- Concurrency: Handling multiple client requests with async I/O.
- Parallelism: Running a data-processing algorithm split across CPU cores.
7. Context Switching & Overheads
- What it is: Switching between threads incurs cost (saving/restoring state). Too many threads can degrade performance.
- Example: Spawning thousands of threads may be worse than using a thread pool.
8. Memory Visibility & Ordering
- What it is: One thread’s updates to shared variables may not be visible immediately to others (due to CPU caches, compiler reordering).
- Mitigation: Use
volatile
(Java),atomic
variables, or memory fences/barriers. - Example: In Java:
-
private volatile boolean running = true;