-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Severe memory leak when setUseEngineSocketByDefault(true) #835
Comments
Btw the same memory leak happens if the client sockets are created in a simple while loop instead of concurrently. The following code will use more and more res mem until the machine runs out of physical memory (when using conscrypt), despite the Xmx512M flag being set. E.g. after creating 1.2 million client sockets, non-conscrypt is using 360MB res mem. After creating 330k client sockets with conscrypt, the res mem usage is approx 11GB, and memory usage will rise forever. Simple loop code:
|
More tests: it makes no difference whether the keystore self-signed cert is created with After splitting the code up into a client process and a server process, I can also confirm that the memory leak occurs both for server sockets and client sockets. After 300k connections, the test showed that the conscrypt client was using 5.5G res mem and the server was using 4.5G res mem. The split client/server code is as follows:
|
Update: Same memory leak happens under AdoptOpenJDK with the OpenJ9 VM (jdk-14.0.1+7_openj9-0.20.0) instead of the Hotspot VM. Same leak also occurs when using OpenJDK 11 GA (build 11+28) from https://jdk.java.net/archive/ Setting I've tried using When looking at I also get the same leak when using Jetty with Conscrypt, and I have to restart Jetty approx every 24hours to avoid the machine from running out of memory. |
Update: the memory leak goes away when the line The exception that happens on JDK13/14 when the line is not included is as follows:
|
I've made a little progress. I've written a thread-safe AllocatedBuffer pool. When 100k connections are generated in series, this reduces res mem from 3.3G to 2.3G. There is still therefore another resource leak somewhere. There was no improvement when setting the Buffer pool test code:
|
Thanks for the detailed reproduction and apologies for the slow response. Running (essentially) that code on MacOS, it fails by running out of file descriptors long before memory becomes an issue (about 2000 iterations). It seems to be leaking ~ 4 pipe fds each iteration, which suggests that the Would you mind checking whether you're seeing the same behaviour as well? Although I'm slightly surprised your environment has that many fds available... With your code running, Thanks! |
@prbprbprb Thanks for looking into this, much appreciated. Btw I'm using the Main5 code (from my previous post) for the following tests, because that allows me to easily enable and disable the AllocatedBuffer pool. On Centos 7, the I then tested on MacOS (10.15.4), and I found exactly the same situation. I do also see the process memory usage grow forever on MacOS, as queried using the command Maybe you're getting an fd leak because you're using a more recent development version of Conscrypt, rather than the latest 2.4.0 release? |
Yeah, looks like there are two ways the pipe resources can get released. They're supposed to get released when an engine-based socket is closed, via freeIfDone but that clearly isn't happening in this case and should hopefully be easy to debug. They also get released by the NativeSsl finalizer, so (speculation) it looks like on MacOS those objects aren't being GCed fast enough to prevent running out of file descriptors. On the minus side, that means this isn't your memory leak either but once I fix this I can go looking for that.
Actually no, I'm working from home due to the current global unpleasantness, so the path of least resistance for me to reproduce and debug a little was local IntelliJ and the 2.4.0 uber jar :) I'll switch back to remote Linux/Android though as we need to squish this on all platforms, especially as the default socket implementation in the Android 11 preview releases is the engine one - so your help here is very much appreciated! |
FWIW the number of open fds stabilises on Android too, bounces between 60ish and 150ish |
It may not be all that useful, but I thought I'd mention: On Linux, I dumped the memory of the Java process after about 100k connections had been completed, using I then did a hex edit of the memory dump and searched for occurrences of that
One guess is that localhost is one of the parameters to a JNI call that is not being cleared out of memory. |
There are some patterns in the hex
|
Aha, some googling has turned up the following: Notice in the hex, there is a sequence These are TLS cipher suites. See e.g. https://security.stackexchange.com/questions/166556/ssl-tls-cipher-suites-order-for-windows-2016-hosted-https-sites and https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.2.0/com.ibm.zos.v2r2.gska100/csdcwh.htm
So it looks like the 100k repetition in memory is the contents of a buffer used for the TLS handshake. This solves the question of why Note that this is part of the |
Previousy, this method closed the underlying socket first and then the SSLEngine. However closing a connected SSLEngine queues a TLS close notification which obviously can't be sent if the socket is closed. Also, the pending bytes prevent the engine from freeing its native resources including pipe file descriptors until the SSLEngine is eventually garbage collected. Fixing this exposed that the fix for google#781 was incomplete and relied on the native SSL data *not* being cleared on close and so that is also fixed herein. This may also help with google#835 but didn't help me to reproduce that.
Previousy, this method closed the underlying socket first and then the SSLEngine. However closing a connected SSLEngine queues a TLS close notification which obviously can't be sent if the socket is closed. Also, the pending bytes prevent the engine from freeing its native resources including pipe file descriptors until the SSLEngine is eventually garbage collected. Fixing this exposed that the fix for google#781 was incomplete and relied on the native SSL data *not* being cleared on close and so that is also fixed herein. This may also help with google#835 but didn't help me to reproduce that.
Previousy, this method closed the underlying socket first and then the SSLEngine. However closing a connected SSLEngine queues a TLS close notification which obviously can't be sent if the socket is closed. Also, the pending bytes prevent the engine from freeing its native resources including pipe file descriptors until the SSLEngine is eventually garbage collected. Fixing this exposed that the fix for google#781 was incomplete and relied on the native SSL data *not* being cleared on close and so that is also fixed herein. This may also help with google#835 but didn't help me to reproduce that.
Previousy, this method closed the underlying socket first and then the SSLEngine. However closing a connected SSLEngine queues a TLS close notification which obviously can't be sent if the socket is closed. Also, the pending bytes prevent the engine from freeing its native resources including pipe file descriptors until the SSLEngine is eventually garbage collected. Fixing this exposed that the fix for google#781 was incomplete and relied on the native SSL data *not* being cleared on close and so that is also fixed herein. This may also help with google#835 but didn't help me to reproduce that.
#842 didn't get me closer on this... Using IntelliJ's memory profiler (which is pretty basic), I see your test app creating about 250 new Strings (and their associate byte arrays) for each 500 iterations. These mostly seem to be the "Hello" being read by the client and the hostname as it's being parsed by java.net.Socket. Not sure why these aren't being GCed correctly, but both socket implementations behave the same for me. I'll dig into it on Linux tomorrow and see if it behaves differently. I do see a few larger byte arrays than this that might be the kind of thing you're seeing, but IntellIJ can't track their creation... |
Just an idea: maybe do some cheap stdout allocation tracking using the custom BufferAllocator code and make sure all the allocations are returned? If they don't match up in numbers, you can log a stack trace of where they're allocated and dig into why they're not released elsewhere. |
Previousy, this method closed the underlying socket first and then the SSLEngine. However closing a connected SSLEngine queues a TLS close notification which obviously can't be sent if the socket is closed. Also, the pending bytes prevent the engine from freeing its native resources including pipe file descriptors until the SSLEngine is eventually garbage collected. Fixing this exposed that the fix for google#781 was incomplete and relied on the native SSL data *not* being cleared on close and so that is also fixed herein. This may also help with google#835 but didn't help me to reproduce that.
Previousy, this method closed the underlying socket first and then the SSLEngine. However closing a connected SSLEngine queues a TLS close notification which obviously can't be sent if the socket is closed. Also, the pending bytes prevent the engine from freeing its native resources including pipe file descriptors until the SSLEngine is eventually garbage collected. Fixing this exposed that the fix for google#781 was incomplete and relied on the native SSL data *not* being cleared on close and so that is also fixed herein. This may also help with google#835 but didn't help me to reproduce that.
I've written a new test that varies the amount of data sent by the client and server, and measures the res mem usage after a specified number of iterations. The results are as follows: Performing 10k iterations with conscrypt and buffer pooling enabled, -Xmx=32m:
Performing 10k iterations with conscrypt and buffer pooling enabled, -Xmx=64m:
Then I tested with conscrypt disabled, with only 1k iterations this time (due to pure Java being slow), -Xmx=32m:
The biggest question raised by this result is: with Conscrypt enabled, why does it take 30x longer for the server to transmit 100kb than the client (441s vs 15s)? Note that I tried repeating the tests, reversing the order of transmission so that the server reads first instead of writing first, and the results were exactly the same (both in conscrypt and non-conscrypt tests). The source code for this test is as follows:
|
Previoly, this method closed the underlying socket first and then the SSLEngine. However closing a connected SSLEngine queues a TLS close notification which obviously can't be sent if the socket is closed. Also, the pending bytes prevent the engine from freeing its native resources including pipe file descriptors until the SSLEngine is eventually garbage collected. Fixing this exposed that the fix for google#781 was incomplete and relied on the native SSL data *not* being cleared on close and so that is also fixed herein. This may also help with google#835 but didn't help me to reproduce that.
Another update: Two weeks ago, I removed conscrypt from my production Jetty instances (for the purposes of serving incoming requests). However, one of the Jetty instances still uses conscrypt to make outgoing connections to retrieve external content. Before: Both Jetty instances needed restarting every 24hrs or so due to the memory leak. After: The Jetty instance that does not use conscrypt at all has been running for 2 weeks with no memory leak. However, the other Jetty instance that does not use conscrypt for incoming connections but does use conscrypt for outgoing connections now has a slow memory leak that causes it to need to be restarted every 7 days or so. Bottom line: It looks like the memory leak applies both to conscrypt server sockets and client sockets. This should make it easier to reproduce the issue, since it can be reproduced purely by making lots of outgoing connections to a broad variety of different web sites. |
If it helps with memory profiling, here is code that will attempt to connect to as many different domains as possible (retrieving max. 1 url per domain, to prevent hammering any particular web site). It spawns 200 threads, starts at wikipedia, and branches outwards to locate as many different websites as it can to connect to. Note that this is a much slower memory leak than the memory leak that happens when conscrypt is used for accepting incoming connections to Jetty. import org.conscrypt.OpenSSLProvider;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.security.Security;
import java.util.*;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main11 {
private static ArrayList<String> extractUrlsFromString(String content) {
ArrayList<String> result = new ArrayList<>();
String regex = "https://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(content);
while (m.find()) result.add(m.group());
return result;
}
public static String extractDomainFromUrl(String href) {
href = href.trim().substring(href.indexOf("://") + "://".length());
if(href.indexOf('#')!=-1) href = href.substring(0, href.indexOf('#'));
if(href.indexOf('/')!=-1) href = href.substring(0, href.indexOf('/'));
if(href.contains(":")) href = href.substring(0, href.indexOf(":"));
if(href.contains("?")) href = href.substring(0, href.indexOf("?"));
if(href.contains("\"")) href = href.substring(0, href.indexOf("\""));
href = href.trim();
return href.toLowerCase();
}
private static String getUrlContent(String url) {
try {
int maxLines = 200;
URL urlObj = new URL(url);
HttpURLConnection conn = (HttpURLConnection) urlObj.openConnection();
conn.setConnectTimeout(2000);
conn.setReadTimeout(2000);
conn.setAllowUserInteraction(false);
conn.setRequestMethod("GET");
conn.setUseCaches(false);
conn.setDoInput(true);
conn.setDoOutput(true);
conn.connect();
int statusCode = conn.getResponseCode();
BufferedReader br = new BufferedReader(new InputStreamReader(statusCode == 200 ? conn.getInputStream() : conn.getErrorStream(), "UTF-8"));
StringBuilder sb = new StringBuilder();
String line;
int c=0;
while ((line = br.readLine()) != null) {
sb.append(line);
sb.append('\n');
c++;
if(c>maxLines) break;
}
br.close();
return sb.toString();
}
catch (Exception e) {
return "";
}
}
public static void main(String[] args) throws Exception {
System.out.println("pid: " + ProcessHandle.current().pid());
Scanner scanner = new Scanner(System.in);
System.out.println("Use conscrypt? (y/n)");
boolean useConscrypt = scanner.nextLine().toLowerCase().startsWith("y");
if(useConscrypt) Security.insertProviderAt(new OpenSSLProvider(), 1);
Set<String> domainsVisited = Collections.synchronizedSet(new HashSet<>());
Thread progressThread = new Thread(()->{
while(true) {
try {
Thread.sleep(1000);
Process p = Runtime.getRuntime().exec("ps -o rss " + ProcessHandle.current().pid());
Scanner s = new Scanner(p.getInputStream()).useDelimiter("\\A");
s.nextLine();
System.out.println("Domains visited: " + domainsVisited.size() + ", Res mem (MB): " + (Integer.parseInt(s.nextLine().trim()) / 1024));
p.destroy();
}
catch (Exception e) {
e.printStackTrace();
}
}
});
progressThread.setDaemon(true);
progressThread.start();
var queue = new ConcurrentLinkedQueue<String>();
queue.add("https://en.wikipedia.org/wiki/United_States");
int threads = 200;
for(int i=0; i<threads; i++) {
new Thread(() -> {
while (true) {
try {
String url = queue.poll();
if(url==null) {
Thread.sleep(1000);
}
else {
boolean retrieve = false;
{
String domain = extractDomainFromUrl(url);
synchronized (domainsVisited) {
if(!domainsVisited.contains(domain)) {
retrieve = true;
domainsVisited.add(domain);
}
}
}
if(retrieve) {
String content = getUrlContent(url);
if(queue.size()<10000) {
extractUrlsFromString(content).stream()
.filter(u -> !u.contains("wikipedia.org") && !u.toLowerCase().contains(".pdf") && !u.toLowerCase().contains(".jpg") && !u.toLowerCase().contains(".gif") && !u.toLowerCase().contains(".png") && !u.toLowerCase().contains(".zip"))
.forEach(u -> {
String d = extractDomainFromUrl(u);
synchronized (domainsVisited) {
if (!domainsVisited.contains(d)) queue.add(u);
}
});
}
}
}
}
catch (Exception e) {
e.printStackTrace();
}
}
}).start();
}
}
} |
Also would be worth setting |
Since BioWrapper does not have a finalizer, we need to ensure that the native resources are freed when it goes away. In all paths, "networkBio.close()" needs to be called to free the native resources. This change rearranges some items to make sure that the native resource never gets leaked. Possible fix for leak identified in google#835
Thanks for the test program, @knaccc. I ran it and looked at the jeprof output which confirmed your suspicions above. It appears as though the Here are my steps for posterity:
Answer "y" to the question and let it run for a little bit.
|
Since BioWrapper does not have a finalizer, we need to ensure that the native resources are freed when it goes away. In all paths, "networkBio.close()" needs to be called to free the native resources. This change rearranges some items to make sure that the native resource never gets leaked. Possible fix for leak identified in google#835
@kruton Thanks, btw your I see you're still working on your fix-bio-leak branch - please let me know when it's ready and I'll try it on my Jetty instance. |
Yeah, I realized that I was redirecting stderr so I missed some problems my initial patch caused. It would try to double-close the BIO sometimes. It also uncovered trying to use the |
@kruton I've been running your patch since Saturday, and I think your changes have made a very significant improvement. Unfortunately, I think a slow leak still persists. I have one Jetty instance with -Xmx=4G (that only uses conscrypt for incoming connections), and it's currently using 5.6G res mem. This is a huge improvement. Previously this instance would have exceeded 20GB res mem within a couple of days. I'll keep it running for several days to see if there is a slow leak or if there is no leak at all. The second Jetty instance with -Xmx=3G and which uses conscrypt both for incoming and outgoing connections started to exceed 10G res mem, but is experiencing a "slow" leak and not a fast leak. At this point I restarted the instance and used jeprof on it, and the output is as follows: Here are the exact steps I took to build Conscrypt 2.5.1 using your patch. Note that in order for it to build, I had to use BoringSSL commit 48cb69f8bd5606933e1844460551a4bc140622c0. Btw is there a record of the BoringSSL commit that is used with each Conscrypt release? I'm never sure which commit to use. git clone https://github.com/google/conscrypt
cd conscrypt
git checkout 2.5.1
docker build -t conscrypt-deploy release
containerId=$(docker run -td --rm=true conscrypt-deploy)
docker exec -ti $containerId bash
yum -y install gcc-c++
cd /usr/src
rm -fr boringssl
git clone https://boringssl.googlesource.com/boringssl/
cd boringssl/
git checkout 48cb69f8bd5606933e1844460551a4bc140622c0
mkdir build64
cd build64
cmake -DCMAKE_POSITION_INDEPENDENT_CODE=TRUE \
-DCMAKE_BUILD_TYPE=Debug \
-DCMAKE_ASM_FLAGS=-Wa,--noexecstack \
-GNinja ..
ninja
cd /
rm -fr conscrypt
git clone https://github.com/google/conscrypt
cd conscrypt
git checkout 2.5.1
rm -f common/src/main/java/org/conscrypt/ConscryptEngine.java
rm -f common/src/main/java/org/conscrypt/NativeSsl.java
wget --directory-prefix=common/src/main/java/org/conscrypt https://raw.githubusercontent.com/google/conscrypt/30f9218ccc9eaae776220401dbe08c1f1d6bebad/common/src/main/java/org/conscrypt/ConscryptEngine.java
wget --directory-prefix=common/src/main/java/org/conscrypt https://raw.githubusercontent.com/google/conscrypt/30f9218ccc9eaae776220401dbe08c1f1d6bebad/common/src/main/java/org/conscrypt/NativeSsl.java
echo "nostrip=true" >> gradle.properties
sed -i 's/"-O3",/"-O3","-g",/g' openjdk/build.gradle
./gradlew conscrypt-openjdk:build
exit
docker cp $containerId:/conscrypt/openjdk/build/libs/conscrypt-openjdk-2.5.1-linux-x86_64.jar ./
docker stop $containerId |
Update: My production jetty server (using conscrypt for incoming connections only) with Xmx=4G is bouncing between 6G and 7G res mem usage, and has been running for 2 weeks. I'm fairly confident you've therefore fixed the memory leak for server sockets. My other production jetty server (using conscrypt for both incoming and outgoing connections) with Xmx=3G will exceed 12G res mem in just a couple of days, depending on the extent to which it makes outgoing connections. I think that there is therefore still a memory leak for client sockets. |
I started looking into this under the assumption that the main issue was fixed in Conscrypt 2.5.1 (even though use engine socket is now the default). My assumption was that there is a TLS Client leak in Conscrypt remaining and it would show up with a simple sustained load script. Before adding profiling tools, or Jetty into the mix I wanted to see an actual leak in a working correct entirely self contained program. So this certainly won't catch any of the following cases
To be honest, with a simple self contained program I'm not seeing an obvious client leak. Non heap memory is growing at 1 byte per 12 sockets. It's hard to know if this is just JVM growth as there is a lot of accounting going on in the JVM. Could just be some regular JVM native structures. At this point I could either
But I thought I'd ask for advice from anyone still seeing this for client cases, what specific scenarios and environments can you still replicate the problem with? Bytes per request since 10 seconds after JVM launch |
Pastebin for the Intellij kotlin load script https://pastebin.com/Ctff1Ddj |
@yschimke Thanks for looking into this. I could not replicate the memory leak with a simple loopback test, which is why I wrote Main11.java. Main11 will start at wikipedia and follow links outward to as many different domains as possible, so that it encounters a wide variety of TLS implementations/configurations. Note that it will take few hours for the memory leak to show. My server that uses conscrypt to make outbound connections leaks about 10GB a week (so I just restart the process once a week, and that's a workable solution).
The server socket leak was fixed by #893 which is not yet merged, as far as I can see. I'm very happy to deploy #942 to my server and leave it running for a week, to see if I get the leak. I just need to know which boringssl commit to use when building (I use the build method I documented here: #835 (comment) ) |
@knaccc Thanks for the context, really helpful. After applying the PR fixes, do you think it's definitely a Conscrypt leak (incorrectly dealing with varied and bad servers etc in the wild) vs a bug in the web crawler behaviour in Main11 or generally JVM HTTPSUrlConnection or Jetty in the original problem statement. If we suspect it's a bug with varied TLS connections, then a useful mid-point would be your script producing a known list of 10k hostnames to make round robin connections to. And see whether we see the leak. It doesn't look like Main11 is designed to handle errors cleanly though. br is only closed in the successful case. Are we expecting that good GC performance should cause the memory to be reclaimed within the space of the week even though it's possible native allocations, which don't tie back to GC pressure? |
Yes, because if I simply remove Conscrypt as an OpenSSLProvider on the server, there is no longer a memory leak. Additionally, you will see earlier in this thread that I've used jemalloc to demonstrate that the leak is as a result of calls to native code (OPENSSL_malloc) rather than Java object leaks.
FYI my real-world code doesn't look anything like Main11. Instead, it retrieves images from a list of specified web pages. That code uses Apache HTTPClient, and in turn will use Conscrypt if it is registered as a provider. It has Xmx set to 3GB, so garbage collection should have kicked in long before it exceeded 10GB of res mem.
It's possible that the leak happens due to unexpected connection issues, or a variety of uncommon circumstances that may only show themselves in the real world. It's annoying that the leak is so slow, which makes it much harder to iteratively narrow down a list. |
Btw I tried to build using the procedure documented here, except with |
@prbprbprb is the cleanest way to get a built jar from the github PR artifacts? #942 -> https://github.com/google/conscrypt/suites/1705327032/artifacts/32133416 |
I suspect it's exactly that, e.g. some unexpected exception order breaking resource release, e.g.
Yeah, 2.5.1 won't build against the latest BoringSSL due to changes. Rather than try and build against a patched 2.5.1 I'd suggest patching the master branch and building against that. There isn't anything else since 2.5.1 that should break you (famous last words). e.g.
Should be, when they work! (Windows and MacOS both currently broken), I haven't actually tried it myself though. Looking at the one you linked, it doesn't look like the uber jar is built but everything else seems to be in there. |
If you want to stick to 2.5.1 I can probably get you the exact BoringSSL hash it was built against, I'll just need to bring our Windows build VM up. Note to self: we should document this hash with the release in future. |
@prbprbprb I tried this: git clone https://github.com/google/conscrypt
cd conscrypt
git checkout master
docker build -t conscrypt-deploy release
containerId=$(docker run -td --rm=true conscrypt-deploy)
docker exec -ti $containerId bash
cd /
rm -fr conscrypt
git clone https://github.com/google/conscrypt
cd conscrypt
git checkout master
wget https://github.com/google/conscrypt/pull/942.patch
git apply 942.patch
./gradlew conscrypt-openjdk:build and it failed with:
Am I doing something wrong? |
I'm going to focus on this first actually. I'd rather spend the time once getting clean CI builds off PRs, than running manual steps. I'm still catching up with Conscrypt development here, so consider anything I'm working on as a sideline. |
Any update on this? |
@pparth Not from me, not sure if anyone else is active on this at the moment. But if you do want to dig in you should be able to grab builds now from CI rather than building yourself with docker etc. |
This is a chart of hourly recorded memory usage of a Jetty web server over a period of 470 hours, starting from the time when the Jetty process was first launched. The JVM Xmx is set to 3GB, and it's running on Ubuntu 20.04 LTS. The web server uses Conscrypt both for incoming and outgoing connections, including using Conscrypt for outgoing TLS JDBC connections (by installing Conscrypt as the default security provider). It is using the CI build based on commit 52f3cf1 (January 28, 2021). You can see that suddenly, in the last few hours, there was a 2.6 GB spike in res mem usage (from about 8.3GB to 10.9GB). This coincided with a database upgrade, which involved the DB being shut down twice (each time for about 10 minutes). This would have meant that new JDBC connections via Conscrypt could not have been made to the DB. Logs show that approx. 1700 It's possible that the leak is due to existing outgoing JDBC connections being suddenly dropped, or due to new outgoing JDBC connections failing to be established. It's also possible that the leak was caused by the server accepting lots of incoming HTTPS requests which were blocking due to the JDBC connection pool locking up as it waited to re-establish connections to the DB. Note that several hours have passed since the DB downtime happened, and although the Jetty server has gone back to normal, and is able to serve HTTPS connections and communicate with the DB via JDBC, its res mem footprint has stayed permanently high. |
I was able to produce a working test case that demonstrates a memory leak when a socket fails to connect. I've documented it here in a fresh issue, so that people don't have to read through this very long thread. |
Nice work. I'd focused on real world scenarios which mostly had successful connections. |
Sorry for the late join in. We also noticed this happening in production. Our services send only out going connections and usually the memory usage spikes when the targets timeout the http request or fail to connect in general. While testing I also noticed a weird trend which might be or might not be related, that the memory usage and especially the build up of "leaked memory" is much lower on Oracle JDK opposed to AdoptOpenJDK (Both 11). This could be a coincidence but this is a pattern we observed where the AdoptOpenJDK instances would need to be restarted or get OOM killed every 10-15hours where as the JVM ones would be able to run for a couple of days without much trouble (still leaking memory ofc) I have setup a proper test environment with as much similar test conditions as possible and will report back if that wasn't just a fluke in our observations |
I've included below the source code of a simple test that creates 100k localhost SSL connections.
If it is run without conscrypt, it uses approx 330MB of Linux res mem. With conscrypt, it uses approx 3.7GB res mem, despite Xmx being set to 512M.
I've tested this both on Ubuntu 18 and Centos 7, on two different machines. One machine is a Ubuntu VM with 2 virtual cores on an Apple laptop with 6 cores, and the other machine has dual Xeons, each with 8 cores. Both tests were with OpenJDK 14.0.1 as downloaded from https://jdk.java.net/14/ and with conscrypt-openjdk-2.4.0-linux-x86_64 from the maven repo.
(Note that this memory leak happens when testing with both conscrypt 2.4.0 and 2.2.1.)
To make the code run, a keystore must first be created using the command:
keytool -genkey -keyalg EC -keystore keystore -groupname secp256r1 -alias localhost -keypass password -storepass password -dname "CN=localhost,OU=X,O=X,L=X,S=X,C=X"
The conscypt jar has to be available, which can be retrieved with
wget https://repo1.maven.org/maven2/org/conscrypt/conscrypt-openjdk/2.4.0/conscrypt-openjdk-2.4.0-linux-x86_64.jar
The code is compiled using
javac Main.java -cp conscrypt-openjdk-2.4.0-linux-x86_64.jar
The code is run using
java -Xmx512M -cp conscrypt-openjdk-2.4.0-linux-x86_64.jar:. -Djavax.net.ssl.keyStore=keystore -Djavax.net.ssl.keyStorePassword=password -Djavax.net.ssl.trustStore=keystore -Djavax.net.ssl.trustStorePassword=password Main
The code, to be placed inside Main.java:
Thanks in advance for any assistance.
The text was updated successfully, but these errors were encountered: