[FRIAM] Large Memory Java Applications

Owen Densmore owen at backspaces.net
Fri May 26 22:21:42 EDT 2006


Hi Dan, good to hear from you.

So, given your understanding of RedFish .. would you recommend some  
sort of "super computing" stunt for us?  Either a cluster or a  
biggish multiprocessor 64-bit linux/solaris box?

We've been considering such a step, or hoping that there becomes  
available some sort of "compute farm" much like the Blender render  
farm we're using for great visualization work.

     -- Owen

Owen Densmore
http://backspaces.net - http://redfish.com - http://friam.org


On May 26, 2006, at 9:43 AM, Dan Kunkle wrote:

> For my thesis work, I'm looking at doing very large graph search to
> find optimal solutions to Rubik's cube. As a component of our
> approach, we're planning to do a full breadth-first search of a graph
> with over 1 trillion nodes.
>
> To do this, we're using the combined disk space of a cluster of
> machines. Of course, disks are slow if you use random access, so you
> have to convert all of your algorithms to using streaming access only
> (depending on the algorithm, this can be very tricky). In fact, with
> today highly non-uniform memory hierarchies (L1 cache -> L2 cache ->
> ... -> main memory), even using RAM by way of random access can make
> large computations infeasible. So, these same streaming-access-only
> algorithms can also provide a massive speedup when disk isn't used.
> Further, if you use streaming access only with disk, the aggregate
> disk bandwidth can equal that of the bandwidth to main memory. So, if
> you have a cluster of 100 machines with 300 GB of disk each, this
> means your cluster works like a single machine with 30 TB of RAM.
>
> Depending on whether you have an explicit graph (the whole thing
> stored in memory) or an implicit graph (defined by a function
> providing the neighbor nodes of any given node), the techniques may
> differ. I'm more familiar with implicit graph search, but it sounds
> like you may be working with explicit graphs.
>
> Here is a reference from each camp dealing with very large searches in
> external memory:
>
> AI community -- implicit graph searches:
> Korf and Schultze, Large-Scale Parallel Breadth-First Search
> http://www.cs.ualberta.ca/~bulitko/F05/CMPUT651/papers/pbfs.pdf
>
> Explicit graph manipulation:
> STXXL, (Standard Template Library for Extra Large Data Sets (C++  
> library))
> http://stxxl.sourceforge.net/
>
> I don't have any Java specific advice, but the key of using only
> streaming access with large data structures is language independent.
>
> -Dan
>
> -- 
> [ http://www.ccs.neu.edu/home/kunkle/ ]
>
> On 5/26/06, Owen Densmore <owen at backspaces.net> wrote:
>> I started a discussion on large memory java applications on Java
>> Lobby.  Basically how to approach graph problems with 250 million
>> nodes.  The response was surprisingly good!
>>    http://www.javalobby.org/java/forums/t72726.html
>>
>> It seems there are two approaches:
>> - Hardware: build a large server or a cluster and use interesting
>> distributed solutions that have become available for this
>> - Software: use interesting streaming|piping solutions, or possibly
>> DB-like disk solutions that can be tuned for your application.
>>
>> Has anyone here tackled truly large problems like this successfully?
>>
>>      -- Owen
>>
>> Owen Densmore
>> http://backspaces.net - http://redfish.com - http://friam.org
>>
>>
>>
>> ============================================================
>> FRIAM Applied Complexity Group listserv
>> Meets Fridays 9a-11:30 at cafe at St. John's College
>> lectures, archives, unsubscribe, maps at http://www.friam.org
>>
>
> ============================================================
> FRIAM Applied Complexity Group listserv
> Meets Fridays 9a-11:30 at cafe at St. John's College
> lectures, archives, unsubscribe, maps at http://www.friam.org




More information about the Friam mailing list