Skip to content

Thoughts on synchronized access on RE2 instances #46

@nicktrav

Description

@nicktrav

We've recently been doing some profiling on one of our applications that uses re2j quite heavily, and we're noticing a decent amount of thread contention coming from RE2 when doing concurrent patten matching (i.e. multiple threads using the same Pattern instance).

Both RE2#get and RE2#put synchronize access on the monitor to obtain either a cached Machine instance or instantiate a new one if there are none that already exist.

Here's a small example that when run demonstrates the blocking thread behavior:

public class ThreadContentionExample {

  private static final int THREADS = 20;
  private static final Pattern PATTERN = Pattern.compile("\\w+://.*");

  public static void main(String ...args) throws InterruptedException {
    Thread[] threads = new Thread[THREADS];
    for (int i = 0; i < THREADS; i++) {
      Thread thread = new Thread() {
        @Override public void run() {
          while (true) {
            PATTERN.matcher("https://www.google.com").matches();
          }
        }
      };
      thread.start();
      threads[i] = thread;
    }

    for (Thread thread : threads) {
      thread.join();
    }
  }
}

The thread profile (via YourKit) looks as follows:

screen shot 2018-02-12 at 11 00 38 am

I did some JMH profiling with various other concurrent data structures, but it seems like all have slightly worse off performance than the current implementation at lower thread counts (within error bounds anyway). A ConcurrentLinkedQueue or Dequeue seems to be slightly more performant at higher thread counts. However, even if they were better, many of these more "exotic" classes are not available in GWT anyway or they're at a higher JDK language level (re2j uses 1.6 features now), so I think that precludes them from use.

That said, I was wondering if there were any thoughts how this synchronization bottleneck could be removed, or at least improved a little, in the case that there are many threads accessing the same Pattern instance?

Our current approach is to just use ThreadLocal Patterns in our application code (as we have bounded thread pools). I see that the ThreadLocal approach was adopted in re2j too, but it lead to some memory leaks, so it was reverted.

This definitely isn't a dealbreaker for us, and it's not really a bug either, just food for thought.

Re2j has been great! Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions