Moon of my life

Monday, April 16, 2012

Finding your SQL statements run on the database through JDBC

Note: The article was written back in 2009. I recommend log4jdbc instead of jdbclogger now a days.

Why is it of any interest at all?

The application may not log all the SQL statements it is executing. If you are using frameworks/tools that generate SQL statements, then it can be very handy to find out what the generated statements look like (specially if you looking forward to optimize you database performance using indexes and such).

Of-course, a lot of the frameworks (such as Hibernate, iBatis) out there do allow you to configure their logging so that you can see what SQLs are being generated. However for prepared statements, most of them fail to display the final SQL executed on your database.

Tools:

One interesting tool to get this job done is: JDBC Logger. Its easy to configure specially if you are already using a connection pool.

Configuration:

Follow the steps below to configure your connection pool (Tomcat 6.0) to use JDBC Logger:

download the binaries and put the jar files in your classpath

configure jdbc to use JDBC Logger's driver

<Resource
      name="jdbc/mydb"
      auth="Container"
      type="javax.sql.DataSource"
      ...
      driverClassName="net.sourceforge.jdbclogger.JdbcLoggerDriver"
      url="jdbc:oracle:thin:@localhost:1521:orcl"
   />

create jdbclogger.properties in your application server classpath (on Tomcat 6, you can add it to your $CATALINA_HOME/lib directory)
add your desired jdbc driver configuration into the file created above. Example entry: jdbclogger.driver=oracle.jdbc.driver.OracleDriver
set logging level for JDBC Logger to DEBUG: log4j.logger.net.sourceforge.jdbclogger=DEBUG

Done. Note that, the amount of SQL statements logged may be a bit too much to have in one log file. In that case, you can configure log4j to let JDBC Logger to log the SQL statements into a separate file.

SSH Tunneling & Reverse Tunneling

SSH Tunneling:

In short, SSH tunneling allows unix users to connect to a remote computer (through ssh) and map a local port to the remote computer port.

SSH Reverse Tunneling:

SSH reverse tunneling allows a unix user to connect to a remote computer and map a port on the remote computer to a port in the local machine.

The Problem:

We need to test some kind of external service which redirects back to one of our provided service urls. We need a server with a public IP to do it. However, its a pain to always compile, deploy on the public server and test.

Instead, it will be much easier to be able to map a remote server port to a port on my local machine so that I can deploy my services locally to which the external service will redirect to.

Assume, our external server hostname is mapped to: example.com and we will use sso.example.com to provide Single Sign On (sso) support. From the example site, when the user wants to log in using the SSO service provided by sso.example.com, the user gets redirected to sso.example.com with certain request parameters. Once the user has successfully logged into sso.example.com, he gets redirected back to example.com with some parameters which indicate that the user logged in successfully.

Now to test this, we need to re-deploy the services on to example.com every time we fix something. Of-course, we can mount a file system on the public server that reads the deployment files from a nfs share which can be mounted to some directory on your machine.

But if the public server and your machine are in different networks and your machine do not have a public IP, then its slightly more tricky to achieve. SSH reverse tunneling will help us come to a solution.

Assume that your machine host name is "developer" (it can of-course be an IP).

ssh -f user@example.com -R example.com:8081:developer:8080 -N

What this will do is it will log into the remote server "example.com", open a port "8081" and connect it to "developer:8080". So, when anyone logs into "example.com" and sends a request to "http://localhost:8081", the request comes to port 8080 on developer.

But this is no fun since to make the request, a user has to log into example.com. So to fix that, we can map another port on example.com to forward everything to port "8081" which is the reverse tunneling port. To do it, we have to first log into example.com and execute the following command:

ssh -f user@example.com -L example.com:8080:localhost:8081 -N

Once this command is executed, all requests send to example.com:8080 will be forwarded to example.com:8081 which in turn will forward everything to "developer:8080".

Why not try this shortcut?

ssh -f user@example.com -R example.com:8080:developer:8080 -N

This does log into example.com and map port 8080 to forward everything to developer:8080. However, it only works if someone logs into example.com and sends a request to "localhost:8080". It doesn't work if a user simply sends the request to "example.com:8080". The only way to make the shortcut mentioned above work is to configure the ssh daemon with the GatewayPorts option enabled. By default, sshd binds the reverse tunneling port to the loop back interface only, thus preventing remote hosts from using the reverse tunneling port. The default value for GatewayPorts is no .

Wednesday, August 17, 2011

How Java 7 Fork/Join can improve performance

Like many Java developers out there, I am also playing with the new Java 7 features. One of the features that caught my attention is the new Fork/Join framework.

I will not waste a lot time explaining how to use Fork/Join in a program since that is already covered well at: Fork/Join tutorial

However, this is what I did to test it out.

Disclaimer: Java's time calculation is surely debatable and there are quite a few talks out there that criticize micro bench marking as well. So please keep that in mind before expecting the results to be accurate in a production system.

First the (imaginary) problem

Assumptions:

Add operation is expensive

Problem:

Input: the program is given an array of numbers
Output: the program must compute the summation of the given numbers

The program first iterates over an array of SlowNumber objects and calculates the sum of it. The time needed to perform this operation is what is later in the post referred to as Serial Execution Time.

After that, the program:

Breaks down the array of SlowNumber objects into smaller arrays with start and end indexes
Passes these smaller segments into separate forks
Keeps a small segment for itself to calculate the sum
Gets the sums calculated by the child-tasks and adds them together with the sum found in the previous step

This time needed by the whole process of calculating the sum in parallel is referred to as Parallel Execution Time later in the post.

I have created a SlowNumber class that allows us to add another SlowNumber to it. Every add operation sleeps for a certain amount of time simulating a time consuming operation.

private class SlowNumber {

private Long value = 0L;

//

public void add(final SlowNumber number) {

sleep(TIME_PER_OPERATION_IN_MILLIS);

value += number.value;

}

// rest of the class

}

Now the interesting part. We have a SlowAdd task that extends RecursiveTask<slownumber>. The compute method checks if the given number of SlowNumber objects are more than a certain size. If so, it simply forks more SlowAdd tasks and divides the array of SlowNumber objects into smaller parts keeping the last set of numbers for itself.

Here is how the first part of the compute method in SlowAdd looks like:

class SlowAdd extends RecursiveTask {

//...

@Override

protected SlowNumber compute() {

List<ForkJoinTask<SlowNumber>> childAddTasks = new ArrayList<>();

// start from fromIndex

int computeBeginning = fromIndex;

// if more numbers than NUMBERS_PER_FORK

if (toIndex - fromIndex > NUMBERS_PER_FORK) {

// create forks with own fromIndex and toIndex to perform add operation

for (int i = fromIndex; i < toIndex; i += NUMBERS_PER_FORK) {

if (i + NUMBERS_PER_FORK >= toIndex) {

// ok, have less or equal to NUMBERS_PER_FORK number of items, so don't fork

computeBeginning = i;

break;

}

SlowAdd slowAdd = new SlowAdd(numbers, i, i + NUMBERS_PER_FORK);

childAddTasks.add(slowAdd);

slowAdd.fork();

}

}

// ...

Once the child tasks are created, the task continues to perform summation on its own share of numbers. After that, it tries to add the summations calculated by the child tasks with its own summation to calculate the total.

      // now perform addition on own share of numbers

SlowNumber total = new SlowNumber();

for (int i = computeBeginning; i < toIndex; i++) {

total.add(numbers[i]);

}

if (!childAddTasks.isEmpty()) {

for (ForkJoinTask childAddTask : childAddTasks) {

// add results created by children

total.add(childAddTask.join());

}

}

return total;

}

// ...

}

Now to execute the SlowAdd task, all we need to do is:

SlowAdd slowAdd = new SlowAdd(slowNumbers, 0, slowNumbers.length);

ForkJoinPool addTaskPool = new ForkJoinPool(i + 1);

final SlowNumber parallelValue = addTaskPool.invoke(slowAdd);

After running the example code quite a few times on a HP EliteBook 8540w running Ubuntu 11.04, the program outputs something very similar to what follows:

Expected total: 164, took: 333 ms

Found slowAdd total: 164, number of CPU: 1, took: 348 ms

Speed up by: 95.0%

Found slowAdd total: 164, number of CPU: 2, took: 313 ms

Speed up by: 106.0%

Found slowAdd total: 164, number of CPU: 3, took: 172 ms

Speed up by: 193.0%

Found slowAdd total: 164, number of CPU: 4, took: 132 ms

Speed up by: 252.0%

Note that, the delay per add operation is set to 10ms. Reducing the delay to only 1ms results into something as follows:

Expected total: 164, took: 36 ms

Found slowAdd total: 164, number of CPU: 1, took: 42 ms

Speed up by: 85.0%

Found slowAdd total: 164, number of CPU: 2, took: 34 ms

Speed up by: 105.0%

Found slowAdd total: 164, number of CPU: 3, took: 19 ms

Speed up by: 189.0%

Found slowAdd total: 164, number of CPU: 4, took: 15 ms

Speed up by: 240.0%

Still a 240% performance improvement despite the overload of breaking the input into smaller pieces, creating new tasks and switching of tasks by the task-manager.

Here is a chart I made for a visual understanding of the difference:

Thought, this micro benchmark looks straight forward enough to motivate all of us to convert all our existing code to use Fork/Join, it should be noted that the problem I chose to demonstrate here is straight forward enough to be broken down into smaller problems which can be executed independently in parallel.

If you think about the popular Map/Reduce problem, then the mapping part can definitely be performed in parallel using fork/join approach to speed up the performance. However, if a set of tasks are interdependent and depends on each others results, then the outcome will surely be different.

In the worst case, if a set of tasks are dependent in a way that results in a linear dependency graph i.e. A -> B -> C -> D, then no matter how many CPUs are available, the tasks will always be executed serially. In such cases, Fork/Join can even decrease the performance of the application since the tasks will be executed serially but with the overhead of the task scheduler.

The first two columns in every bar-chart proves that when the parallel execution has only one CPU, it actually decreases the performance a little. The benefit of parallel execution is only visible when there are more than one CPU.

You can download the full source code here.

Feel free to share your opinion.