EzDevInfo.com

Blob.js

An HTML5 Blob implementation

How to persist LARGE BLOBs (>100MB) in Oracle using Hibernate

I'm struggling to find a way to insert LARGE images (>100MB, mostly TIFF format) in my Oracle database, using BLOB columns.

I've searched thoroughly across the web and even in StackOverflow, without being able to find an answer to this problem.
First of all, the problem...then a short section on the relevant code (java classes/configuration), finally a third section where i show the junit test i've written to test image persistence (i receive the error during my junit test execution)

Edit: i've added a section, at the end of the question, where i describe some tests and analysis using JConsole

The problem

I receive an java.lang.OutOfMemoryError: Java heap space error using hibernate and trying to persist very large images/documents:

java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:133)
at org.hibernate.type.descriptor.java.DataHelper.extractBytes(DataHelper.java:190)
at org.hibernate.type.descriptor.java.BlobTypeDescriptor.unwrap(BlobTypeDescriptor.java:123)
at org.hibernate.type.descriptor.java.BlobTypeDescriptor.unwrap(BlobTypeDescriptor.java:47)
at org.hibernate.type.descriptor.sql.BlobTypeDescriptor$4$1.doBind(BlobTypeDescriptor.java:101)
at org.hibernate.type.descriptor.sql.BasicBinder.bind(BasicBinder.java:91)
at org.hibernate.type.AbstractStandardBasicType.nullSafeSet(AbstractStandardBasicType.java:283)
at org.hibernate.type.AbstractStandardBasicType.nullSafeSet(AbstractStandardBasicType.java:278)
at org.hibernate.type.AbstractSingleColumnStandardBasicType.nullSafeSet(AbstractSingleColumnStandardBasicType.java:89)
at org.hibernate.persister.entity.AbstractEntityPersister.dehydrate(AbstractEntityPersister.java:2184)
at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2430)
at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2874)
at org.hibernate.action.EntityInsertAction.execute(EntityInsertAction.java:79)
at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:273)
at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:265)
at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:184)
at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:321)
at org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:51)
at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1216)
at it.paoloyx.blobcrud.manager.DocumentManagerTest.testInsertDocumentVersion(DocumentManagerTest.java:929)

The code (domain objects, repository classes, configuration)

Here is the stack of technologies i'm using (from DB to business logic tier). I use JDK6.

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Prod
ojdbc6.jar (for 11.2.0.3 release)
Hibernate 4.0.1 Final
Spring 3.1.GA RELEASE

I've two domain classes, mapped in a one-to-many fashion. A DocumentVersion has many DocumentData, each of one can represent different binary content for the same DocumentVersion.

Relevant extract from DocumentVersion class:

@Entity
@Table(name = "DOCUMENT_VERSION")
public class DocumentVersion implements Serializable {

private static final long serialVersionUID = 1L;
private Long id;
private Set<DocumentData> otherDocumentContents = new HashSet<DocumentData>(0);


@Id
@GeneratedValue(strategy = GenerationType.TABLE)
@Column(name = "DOV_ID", nullable = false)
public Long getId() {
    return id;
}

@OneToMany
@Cascade({ CascadeType.SAVE_UPDATE })
@JoinColumn(name = "DOD_DOCUMENT_VERSION")
public Set<DocumentData> getOtherDocumentContents() {
    return otherDocumentContents;
}

Relevant extract from DocumentData class:

@Entity
@Table(name = "DOCUMENT_DATA")
public class DocumentData {

private Long id;

/**
 * The binary content (java.sql.Blob)
 */
private Blob binaryContent;

@Id
@GeneratedValue(strategy = GenerationType.TABLE)
@Column(name = "DOD_ID", nullable = false)
public Long getId() {
    return id;
}

@Lob
@Column(name = "DOD_CONTENT")
public Blob getBinaryContent() {
    return binaryContent;
}

Here are my Spring and Hibernate configuration main parameters:

<bean id="sessionFactory"
    class="org.springframework.orm.hibernate4.LocalSessionFactoryBean">
    <property name="dataSource" ref="dataSource" />
    <property name="packagesToScan" value="it.paoloyx.blobcrud.model" />
    <property name="hibernateProperties">
        <props>
            <prop key="hibernate.dialect">org.hibernate.dialect.Oracle10gDialect</prop>
            <prop key="hibernate.hbm2ddl.auto">create</prop>
            <prop key="hibernate.jdbc.batch_size">0</prop>
            <prop key="hibernate.jdbc.use_streams_for_binary">true</prop>
        </props>
    </property>
</bean>
<bean class="org.springframework.orm.hibernate4.HibernateTransactionManager"
    id="transactionManager">
    <property name="sessionFactory" ref="sessionFactory" />
</bean>
<tx:annotation-driven transaction-manager="transactionManager" />

My datasource definition:

<bean class="org.apache.commons.dbcp.BasicDataSource"
    destroy-method="close" id="dataSource">
    <property name="driverClassName" value="${database.driverClassName}" />
    <property name="url" value="${database.url}" />
    <property name="username" value="${database.username}" />
    <property name="password" value="${database.password}" />
    <property name="testOnBorrow" value="true" />
    <property name="testOnReturn" value="true" />
    <property name="testWhileIdle" value="true" />
    <property name="timeBetweenEvictionRunsMillis" value="1800000" />
    <property name="numTestsPerEvictionRun" value="3" />
    <property name="minEvictableIdleTimeMillis" value="1800000" />
    <property name="validationQuery" value="${database.validationQuery}" />
</bean>

where properties are taken from here:

database.driverClassName=oracle.jdbc.OracleDriver
database.url=jdbc:oracle:thin:@localhost:1521:devdb
database.username=blobcrud
database.password=blobcrud
database.validationQuery=SELECT 1 from dual

I've got a service class, that delegates to a repository class:

@Transactional
public class DocumentManagerImpl implements DocumentManager {

DocumentVersionDao documentVersionDao;

public void setDocumentVersionDao(DocumentVersionDao documentVersionDao) {
    this.documentVersionDao = documentVersionDao;
}

and now the relevant extracts from repository classes:

public class DocumentVersionDaoHibernate implements DocumentVersionDao {

@Autowired
private SessionFactory sessionFactory;

@Override
public DocumentVersion saveOrUpdate(DocumentVersion record) {
    this.sessionFactory.getCurrentSession().saveOrUpdate(record);
    return record;
}

The JUnit test that causes the error

If i run the following unit test i've got the aforementioned error (java.lang.OutOfMemoryError: Java heap space):

@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(locations = { "classpath*:META-INF/spring/applicationContext*.xml" })
@Transactional
public class DocumentManagerTest {

@Autowired
protected DocumentVersionDao documentVersionDao;

@Autowired
protected SessionFactory sessionFactory;

@Test
public void testInsertDocumentVersion() throws SQLException {

    // Original mock document content
    DocumentData dod = new DocumentData();
    // image.tiff is approx. 120MB
    File veryBigFile = new File("/Users/paoloyx/Desktop/image.tiff");
    try {
        Session session = this.sessionFactory.getCurrentSession();
        InputStream inStream = FileUtils.openInputStream(veryBigFile);
        Blob blob = Hibernate.getLobCreator(session).createBlob(inStream, veryBigFile.length());
        dod.setBinaryContent(blob);
    } catch (IOException e) {
        e.printStackTrace();
        dod.setBinaryContent(null);
    }

    // Save a document version linked to previous document contents
    DocumentVersion dov = new DocumentVersion();
    dov.getOtherDocumentContents().add(dod);
    documentVersionDao.saveOrUpdate(dov);
    this.sessionFactory.getCurrentSession().flush();

    // Clear session, then try retrieval
    this.sessionFactory.getCurrentSession().clear();
    DocumentVersion dbDov = documentVersionDao.findByPK(insertedId);
    Assert.assertNotNull("Il document version ritornato per l'id " + insertedId + " è nullo", dbDov);
    Assert.assertNotNull("Il document version recuperato non ha associato contenuti aggiuntivi", dbDov.getOtherDocumentContents());
    Assert.assertEquals("Il numero di contenuti secondari non corrisponde con quello salvato", 1, dbDov.getOtherDocumentContents().size());
}

The same code works against a PostreSQL 9 installation. The images is being written in the database. Debugging my code, i've been able to find that the PostgreSQL jdbc drivers writes on the database using a buffered output stream....while the Oracle OJDBC driver tries to allocate all at once all the byte[]representing the image.

From the error stack:

java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:133)

Is the error due to this behavior? Can anyone give me some insights on this problem?

Thanks everyone.

Memory Tests with JConsole

Thanks to the suggestions received for my question, i've tried to do some simple tests to show memory usage of my code using two different jdbc drivers, one for PostgreSQL and one for Oracle. Test setup:

The test has been conducted using the JUnit test described in the previous section.
JVM Heap Size has been set to 512MB, using parameter -Xmx512MB
For Oracle database, I've used ojdbc6.jar driver
For Postgres database, I've used 9.0-801.jdbc3 driver (via Maven)

First test, with a file of approx 150MB

In this first test, both Oracle and Postgres passed the test (this is BIG news). The file is sized 1/3 of available JVM heap size. Here the picture of JVM memory consumption:

Testing Oracle, 512MB Heap Size, 150MB file

Testing PostgreSQL, 512MB Heap Size, 150MB file

Second test, with a file of approx 485MB

In this second test, only Postgres passed the test and Oracle failed it . The file is sized very near the size of the available JVM heap space. Here the picture of JVM memory consumption:

Testing Oracle, 512MB Heap Size, 485MB file

Testing PostgreSQL, 512MB Heap Size, 485MB file

Analysis of the tests:

It seems that PostgreSQL driver handles memory without surpassing a certain threshold, while Oracle driver behaves very differently.

I can't honestly explain why Oracle jdbc driver leads me to error (the same java.lang.OutOfMemoryError: Java heap space) when using file sized near the available heap space.

Is there anyone that can give me more insights? Thanks a lot for you help :)