Optimized Loading
The goal of optimized loading is to load the smallest amount of data required to complete a transaction in the fewest number of queries. The tuning of JBoss depends on a detailed knowledge of the loading process. The following sections describe the internals of the JBoss loading process and its configuration. Tuning of the loading process really requires a holistic understanding of the loading system, so you might need to read this chapter more than once.
A Loading Scenario
The easiest way to investigate the loading process is to look at a usage scenario. The most common scenario is to locate a collection of entities and iterate over the results, performing some operation. The following example generates an HTML table that contains all the gangsters:
public String createGangsterHtmlTable_none()
throws FinderException
{
StringBuffer table = new StringBuffer();
table.append("<table>");
Collection gangsters = gangsterHome.findAll_none();
for (Iterator iter = gangsters.iterator(); iter.hasNext();) {
Gangster gangster = (Gangster) iter.next();
table.append("<tr>");
table.append("<td>").append(gangster.getName());
table.append("</td>");
table.append("<td>").append(gangster.getNickName());
table.append("</td>");
table.append("<td>").append(gangster.getBadness());
table.append("</td>");
table.append("</tr>");
}
return table.toString();
}
Assume that this code is called within a single transaction and all optimized loading has been disabled. At the findAll_none call, JBoss executes the following query:
SELECT t0_g.id
FROM gangster t0_g
ORDER BY t0_g.id ASC
Then, as each of the eight gangsters in the sample database is accessed, JBoss executes the following eight queries:
SELECT name, nick_name, badness, hangout, organization
FROM gangster WHERE (id=0)
SELECT name, nick_name, badness, hangout, organization
FROM gangster WHERE (id=1)
SELECT name, nick_name, badness, hangout, organization
FROM gangster WHERE (id=2)
SELECT name, nick_name, badness, hangout, organization
FROM gangster WHERE (id=3)
SELECT name, nick_name, badness, hangout, organization
FROM gangster WHERE (id=4)
SELECT name, nick_name, badness, hangout, organization
FROM gangster WHERE (id=5)
SELECT name, nick_name, badness, hangout, organization
FROM gangster WHERE (id=6)
SELECT name, nick_name, badness, hangout, organization
FROM gangster WHERE (id=7)
There are two problems with this scenario. First, an excessive number of queries are executed because JBoss executes one query for the findAll and one query to access each element found. The reason for this behavior has to do with the handling of query results inside the JBoss container. Although it appears that the actual entity beans selected are returned when a query is executed, JBoss really only returns the primary keys of the matching entities and does not load the entity until a method is invoked on it. This is known as the n+1 problem and is addressed with the read-ahead strategies described in the following sections.
Second, the values of unused fields are loaded needlessly. JBoss loads the hangout and organization fields, which are never accessed. (The complex contactInfo field is disabled for the sake of clarity.)
Table 11.1 shows the execution of the queries.
Table 11.1. Unoptimized Query Executionid | name | nick_name | badness | hangout | organization |
|---|
0 | Yojimbo | Bodyguard | 7 | 0 | Yakuza | 1 | Takeshi | Master | 10 | 1 | Yakuza | 2 | Yuriko | Four finger | 4 | 2 | Yakuza | 3 | Chow | Killer | 9 | 3 | Triads | 4 | Shogi | Lightning | 8 | 4 | Triads | 5 | Valentino | Pizza-Face | 4 | 5 | Mafia | 6 | Toni | Toothless | 2 | 6 | Mafia | 7 | Corleone | Godfather | 6 | 7 | Mafia |
Load Groups
The configuration and optimization of the loading system begins with the declaration of named load groups in the entity. A load group contains the names of CMP fields and CMR fields that have a foreign key (for example, Gangster in the Organization-Gangster example) that will be loaded in a single operation. An example of such a configuration is shown here:
<jbosscmp-jdbc>
<enterprise-beans>
<entity>
<ejb-name>GangsterEJB</ejb-name>
<!-- ... -->
<load-groups>
<load-group>
<load-group-name>basic</load-group-name>
<field-name>name</field-name>
<field-name>nickName</field-name>
<field-name>badness</field-name>
</load-group>
<load-group>
<load-group-name>contact info</load-group-name>
<field-name>nickName</field-name>
<field-name>contactInfo</field-name>
<field-name>hangout</field-name>
</load-group>
</load-groups>
</entity>
</enterprise-beans>
</jbosscmp-jdbc>
In this example, two load groups are declared: basic and contact info. Note that the load groups do not need to be mutually exclusive. For example, both of the load groups contain the nickName field. In addition to the declared load groups, JBoss automatically adds a group named * (the star group) that contains every CMP field and CMR field with a foreign key in the entity.
Read-ahead
Optimized loading in JBoss is called read-ahead. This refers to the technique of reading the row for an entity being loaded, as well as the next several rows (hence the term readahead). JBoss implements two main strategies (on-find and on-load) to optimize the loading problem identified in the previous section. The extra data loaded during readahead is not immediately associated with an entity object in memory because entities are not materialized in JBoss until they are actually accessed. Instead, it is stored in the preload cache, where it remains until it is loaded into an entity or the end of the transaction occurs. The following sections describe the read-ahead strategies.
The on-find Strategy
The on-find strategy reads additional columns when the query is invoked. If the query is on-find optimized, JBoss executes the following query when the query is executed:
SELECT t0_g.id, t0_g.name, t0_g.nick_name, t0_g.badness
FROM gangster t0_g
ORDER BY t0_g.id ASC
All the required data would be in the preload cache, so no additional queries would need to be executed while iterating through the query results. This strategy is effective for queries that return a small amount of data, but it becomes very inefficient when you're trying to load a large result set into memory. Table 11.2 shows the execution of this query.
Table 11.2. on-find Optimized Query Executionid | name | nick_name | badness | hangout | organization |
|---|
0 | Yojimbo | Bodyguard | 7 | 0 | Yakuza | 1 | Takeshi | Master | 10 | 1 | Yakuza | 2 | Yuriko | Four finger | 4 | 2 | Yakuza | 3 | Chow | Killer | 9 | 3 | Triads | 4 | Shogi | Lightning | 8 | 4 | Triads | 5 | Valentino | Pizza-Face | 4 | 5 | Mafia | 6 | Toni | Toothless | 2 | 6 | Mafia | 7 | Corleone | Godfather | 6 | 7 | Mafia |
The read-ahead strategy and load-group for a query are defined in the query element. If a read-ahead strategy is not declared in the query element, the strategy declared in the entity element or defaults element is used. The on-find configuration follows:
<jbosscmp-jdbc>
<enterprise-beans>
<entity>
<ejb-name>GangsterEJB</ejb-name>
<!--...-->
<query>
<query-method>
<method-name>findAll_onfind</method-name>
<method-params/>
</query-method>
<jboss-ql><![CDATA[
SELECT OBJECT(g)
FROM gangster g
ORDER BY g.gangsterId
]]></jboss-ql>
<read-ahead>
<strategy>on-find</strategy>
<page-size>4</page-size>
<eager-load-group>basic</eager-load-group>
</read-ahead>
</query>
</entity>
</enterprise-beans>
</jbosscmp-jdbc>
One problem with the on-find strategy is that it must load additional data for every entity selected. Commonly in web applications, only a fixed number of results are rendered on a page. Because the preloaded data is valid only for the length of the transaction and a transaction is limited to a single web HTTP hit, most of the preloaded data is not used. The on-load strategy discussed in the next section does not suffer from this problem.
The left-join read-ahead Strategy
left-join read-ahead is an enhanced on-find read-ahead strategy. It allows you to preload in one SQL query not only fields from the base instance but also related instances that can be reached from the base instance by CMR navigation. There are no limitations for the depth of CMR navigations. There are also no limitations for cardinality of CMR fields used in navigation and relationship type mapping. Both foreign key and relation table mapping styles are supported. Let's look at some examples. Entity and relationship declarations can be found in the following section.
D#findByPrimaryKey
Suppose you have an entity D. A typical SQL query generated for findByPrimaryKey would look like this:
SELECT t0_D.id, t0_D.name FROM D t0_D WHERE t0_D.id=?
Suppose that while executing findByPrimaryKey, you also want to preload two collection-valued CMR fields, bs and cs:
<query>
<query-method>
<method-name>findByPrimaryKey</method-name>
<method-params>
<method-param>java.lang.Long</method-param>
</method-params>
</query-method>
<jboss-ql><![CDATA[SELECT OBJECT(o) FROM D AS o WHERE o.id = ?1]]></jboss-ql>
<read-ahead>
<strategy>on-find</strategy>
<page-size>4</page-size>
<eager-load-group>basic</eager-load-group>
<left-join cmr-field="bs" eager-load-group="basic"/>
<left-join cmr-field="cs" eager-load-group="basic"/>
</read-ahead>
</query>
The left-join declares the relations to be eager loaded. The generated SQL would look like this:
SELECT t0_D.id, t0_D.name,
t1_D_bs.id, t1_D_bs.name,
t2_D_cs.id, t2_D_cs.name
FROM D t0_D
LEFT OUTER JOIN B t1_D_bs ON t0_D.id=t1_D_bs.D_FK
LEFT OUTER JOIN C t2_D_cs ON t0_D.id=t2_D_cs.D_FK
WHERE t0_D.id=?
For the D with the specific ID, you preload all its related Bs and Cs and can access those instances by loading them from the read-ahead cache, not from the database.
D#findAll
In the same way, you could optimize the findAll method on D selects of all the Ds. A normal findAll query would look like this:
SELECT DISTINCT t0_o.id, t0_o.name FROM D t0_o ORDER BY t0_o.id DESC
To preload the relations, you simply need to add the left-join elements to the query:
<query>
<query-method>
<method-name>findAll</method-name>
</query-method>
<jboss-ql><![CDATA[SELECT DISTINCT OBJECT(o) FROM D AS o ORDER BY o.id DESC]]><
/jboss-ql>
<read-ahead>
<strategy>on-find</strategy>
<page-size>4</page-size>
<eager-load-group>basic</eager-load-group>
<left-join cmr-field="bs" eager-load-group="basic"/>
<left-join cmr-field="cs" eager-load-group="basic"/>
</read-ahead>
</query>
Here is the generated SQL:
SELECT DISTINCT t0_o.id, t0_o.name,
t1_o_bs.id, t1_o_bs.name,
t2_o_cs.id, t2_o_cs.name
FROM D t0_o
LEFT OUTER JOIN B t1_o_bs ON t0_o.id=t1_o_bs.D_FK
LEFT OUTER JOIN C t2_o_cs ON t0_o.id=t2_o_cs.D_FK
ORDER BY t0_o.id DESC
Now the simple findAll query preloads the related B and C objects for each D object.
A#findAll
Now let's look at a more complex configuration. In this case, you want to preload instance A along with several relations:
Its parent (self-relation) reached from A with CMR field parent B reached from A with CMR field b, and the related C reached from B with CMR field c B reached from A but this time with CMR field b2 and related to it C reached from B with CMR field c
For reference, this would be the standard query:
SELECT t0_o.id, t0_o.name FROM A t0_o ORDER BY t0_o.id DESC FOR UPDATE
The following metadata describes the preloading plan:
<query>
<query-method>
<method-name>findAll</method-name>
</query-method>
<jboss-ql><![CDATA[SELECT OBJECT(o) FROM A AS o ORDER BY o.id DESC]]>
</jboss-ql>
<read-ahead>
<strategy>on-find</strategy>
<page-size>4</page-size>
<eager-load-group>basic</eager-load-group>
<left-join cmr-field="parent" eager-load-group="basic"/>
<left-join cmr-field="b" eager-load-group="basic">
<left-join cmr-field="c" eager-load-group="basic"/>
</left-join>
<left-join cmr-field="b2" eager-load-group="basic">
<left-join cmr-field="c" eager-load-group="basic"/>
</left-join>
</read-ahead>
</query>
The SQL query generated would look like this:
SELECT t0_o.id, t0_o.name,
t1_o_parent.id, t1_o_parent.name,
t2_o_b.id, t2_o_b.name,
t3_o_b_c.id, t3_o_b_c.name,
t4_o_b2.id, t4_o_b2.name,
t5_o_b2_c.id, t5_o_b2_c.name
FROM A t0_o
LEFT OUTER JOIN A t1_o_parent ON t0_o.PARENT=t1_o_parent.id
LEFT OUTER JOIN B t2_o_b ON t0_o.B_FK=t2_o_b.id
LEFT OUTER JOIN C t3_o_b_c ON t2_o_b.C_FK=t3_o_b_c.id
LEFT OUTER JOIN B t4_o_b2 ON t0_o.B2_FK=t4_o_b2.id
LEFT OUTER JOIN C t5_o_b2_c ON t4_o_b2.C_FK=t5_o_b2_c.id
ORDER BY t0_o.id DESC FOR UPDATE
With this configuration, you can navigate CMRs from any found instance of A without an additional database load.
A#findMeParentGrandParent
Let's look at another example of self-relation. Suppose you want to write a method that would preload an instance, its parent, its grandparent, and its grand-grandparent in one query. To do this, you would use a nested left-join declaration:
<query>
<query-method>
<method-name>findMeParentGrandParent</method-name>
<method-params>
<method-param>java.lang.Long</method-param>
</method-params>
</query-method>
<jboss-ql><![CDATA[SELECT OBJECT(o) FROM A AS o WHERE o.id = ?1]]></jboss-ql>
<read-ahead>
<strategy>on-find</strategy>
<page-size>4</page-size>
<eager-load-group>*</eager-load-group>
<left-join cmr-field="parent" eager-load-group="basic">
<left-join cmr-field="parent" eager-load-group="basic">
<left-join cmr-field="parent" eager-load-group="basic"/>
</left-join>
</left-join>
</read-ahead>
</query>
The generated SQL would look like this:
SELECT t0_o.id, t0_o.name, t0_o.secondName, t0_o.B_FK, t0_o.B2_FK, t0_o.PARENT,
t1_o_parent.id, t1_o_parent.name,
t2_o_parent_parent.id, t2_o_parent_parent.name,
t3_o_parent_parent_parent.id, t3_o_parent_parent_parent.name
FROM A t0_o
LEFT OUTER JOIN A t1_o_parent ON t0_o.PARENT=t1_o_parent.id
LEFT OUTER JOIN A t2_o_parent_parent ON t1_o_parent.PARENT=
t2_o_parent_parent.id
LEFT OUTER JOIN A t3_o_parent_parent_parent ON t2_o_parent_parent.PARENT=
t3_o_parent_parent_parent.id
WHERE (t0_o.id = ?) FOR UPDATE
Note that if you remove the left-join metadata, we will have only this:
SELECT t0_o.id, t0_o.name, t0_o.secondName, t0_o.B2_FK, t0_o.PARENT FOR UPDATE
The on-load Strategy
The on-load strategy block-loads additional data for several entities when an entity is loaded, starting with the requested entity and the next several entities, in the order in which they were selected. This strategy is based on the theory that the results of a find or select will be accessed in forward order. When a query is executed, JBoss stores the order of the entities found in the list cache. Later, when one of the entities is loaded, JBoss uses this list to determine the block of entities to load. The number of lists stored in the cache is specified with the list-cachemax element of the entity. This strategy is also used when faulting in data not loaded in the on-find strategy.
As with the on-find strategy, you declare on-load in the read-ahead element. The on-load configuration for this example is as follows:
<jbosscmp-jdbc>
<enterprise-beans>
<entity>
<ejb-name>GangsterEJB</ejb-name>
<!-- ... -->
<query>
<query-method>
<method-name>findAll_onload</method-name>
<method-params/>
</query-method>
<jboss-ql><![CDATA[
SELECT OBJECT(g)
FROM gangster g
ORDER BY g.gangsterId
]]></jboss-ql>
<read-ahead>
<strategy>on-load</strategy>
<page-size>4</page-size>
<eager-load-group>basic</eager-load-group>
</read-ahead>
</query>
</entity>
</enterprise-beans>
</jbosscmp-jdbc>
With this strategy, the query for the finder method remains unchanged:
SELECT t0_g.id
FROM gangster t0_g
ORDER BY t0_g.id ASC
However, the data will be loaded differently as you iterate through the result set. For a page size of four, JBoss will need to execute only the following two queries to load the name, nickName, and badness fields for the entities:
SELECT id, name, nick_name, badness
FROM gangster
WHERE (id=0) OR (id=1) OR (id=2) OR (id=3)
SELECT id, name, nick_name, badness
FROM gangster
WHERE (id=4) OR (id=5) OR (id=6) OR (id=7)
Table 11.3 shows the execution of these queries.
Table 11.3. on-load Optimized Query Executionid | name | nick_name | badness | hangout | organization
|
|---|
0 | Yojimbo | Bodyguard | 7 | 0 | Yakuza | 1 | Takeshi | Master | 10 | 1 | Yakuza | 2 | Yuriko | Four finger | 4 | 2 | Yakuza | 3 | Chow | Killer | 9 | 3 | Triads | 4 | Shogi | Lightning | 8 | 4 | Triads | 5 | Valentino | Pizza-Face | 4 | 5 | Mafia | 6 | Toni | Toothless | 2 | 6 | Mafia | 7 | Corleone | Godfather | 6 | 7 | Mafia |
The none Strategy
The none strategy is really an anti-strategy. This strategy causes the system to fall back to the default lazy-load code, and it specifically does not read ahead any data or remember the order of the found entities. This results in the queries and performance shown at the beginning of this chapter. You declare the none strategy with a read-ahead element. If the read-ahead element contains a page-size element or eager-load-group, it is ignored. The none strategy is declared in the following example:
<jbosscmp-jdbc>
<enterprise-beans>
<entity>
<ejb-name>GangsterEJB</ejb-name>
<!-- ... -->
<query>
<query-method>
<method-name>findAll_none</method-name>
<method-params/>
</query-method>
<jboss-ql><![CDATA[
SELECT OBJECT(g)
FROM gangster g
ORDER BY g.gangsterId
]]></jboss-ql>
<read-ahead>
<strategy>none</strategy>
</read-ahead>
</query>
</entity>
</enterprise-beans>
</jbosscmp-jdbc>
|