[FRIAM] Large Memory Java Applications

Dan Kunkle dan at redfish.com
Fri May 26 11:43:46 EDT 2006


For my thesis work, I'm looking at doing very large graph search to
find optimal solutions to Rubik's cube. As a component of our
approach, we're planning to do a full breadth-first search of a graph
with over 1 trillion nodes.

To do this, we're using the combined disk space of a cluster of
machines. Of course, disks are slow if you use random access, so you
have to convert all of your algorithms to using streaming access only
(depending on the algorithm, this can be very tricky). In fact, with
today highly non-uniform memory hierarchies (L1 cache -> L2 cache ->
... -> main memory), even using RAM by way of random access can make
large computations infeasible. So, these same streaming-access-only
algorithms can also provide a massive speedup when disk isn't used.
Further, if you use streaming access only with disk, the aggregate
disk bandwidth can equal that of the bandwidth to main memory. So, if
you have a cluster of 100 machines with 300 GB of disk each, this
means your cluster works like a single machine with 30 TB of RAM.

Depending on whether you have an explicit graph (the whole thing
stored in memory) or an implicit graph (defined by a function
providing the neighbor nodes of any given node), the techniques may
differ. I'm more familiar with implicit graph search, but it sounds
like you may be working with explicit graphs.

Here is a reference from each camp dealing with very large searches in
external memory:

AI community -- implicit graph searches:
Korf and Schultze, Large-Scale Parallel Breadth-First Search
http://www.cs.ualberta.ca/~bulitko/F05/CMPUT651/papers/pbfs.pdf

Explicit graph manipulation:
STXXL, (Standard Template Library for Extra Large Data Sets (C++ library))
http://stxxl.sourceforge.net/

I don't have any Java specific advice, but the key of using only
streaming access with large data structures is language independent.

-Dan

-- 
[ http://www.ccs.neu.edu/home/kunkle/ ]

On 5/26/06, Owen Densmore <owen at backspaces.net> wrote:
> I started a discussion on large memory java applications on Java
> Lobby.  Basically how to approach graph problems with 250 million
> nodes.  The response was surprisingly good!
>    http://www.javalobby.org/java/forums/t72726.html
>
> It seems there are two approaches:
> - Hardware: build a large server or a cluster and use interesting
> distributed solutions that have become available for this
> - Software: use interesting streaming|piping solutions, or possibly
> DB-like disk solutions that can be tuned for your application.
>
> Has anyone here tackled truly large problems like this successfully?
>
>      -- Owen
>
> Owen Densmore
> http://backspaces.net - http://redfish.com - http://friam.org
>
>
>
> ============================================================
> FRIAM Applied Complexity Group listserv
> Meets Fridays 9a-11:30 at cafe at St. John's College
> lectures, archives, unsubscribe, maps at http://www.friam.org
>



More information about the Friam mailing list