-
Notifications
You must be signed in to change notification settings - Fork 164
Description
We've recently been doing some profiling on one of our applications that uses re2j
quite heavily, and we're noticing a decent amount of thread contention coming from RE2
when doing concurrent patten matching (i.e. multiple threads using the same Pattern
instance).
Both RE2#get
and RE2#put
synchronize access on the monitor to obtain either a cached Machine
instance or instantiate a new one if there are none that already exist.
Here's a small example that when run demonstrates the blocking thread behavior:
public class ThreadContentionExample {
private static final int THREADS = 20;
private static final Pattern PATTERN = Pattern.compile("\\w+://.*");
public static void main(String ...args) throws InterruptedException {
Thread[] threads = new Thread[THREADS];
for (int i = 0; i < THREADS; i++) {
Thread thread = new Thread() {
@Override public void run() {
while (true) {
PATTERN.matcher("https://www.google.com").matches();
}
}
};
thread.start();
threads[i] = thread;
}
for (Thread thread : threads) {
thread.join();
}
}
}
The thread profile (via YourKit) looks as follows:
I did some JMH profiling with various other concurrent data structures, but it seems like all have slightly worse off performance than the current implementation at lower thread counts (within error bounds anyway). A ConcurrentLinkedQueue
or Dequeue
seems to be slightly more performant at higher thread counts. However, even if they were better, many of these more "exotic" classes are not available in GWT anyway or they're at a higher JDK language level (re2j uses 1.6 features now), so I think that precludes them from use.
That said, I was wondering if there were any thoughts how this synchronization bottleneck could be removed, or at least improved a little, in the case that there are many threads accessing the same Pattern
instance?
Our current approach is to just use ThreadLocal
Pattern
s in our application code (as we have bounded thread pools). I see that the ThreadLocal
approach was adopted in re2j too, but it lead to some memory leaks, so it was reverted.
This definitely isn't a dealbreaker for us, and it's not really a bug either, just food for thought.
Re2j
has been great! Thanks!