19 Nov 2010 |

Devoxx - Day 4

The last full conference day of the Devoxx was again packed full with very interesting talks of various kind. It started with a keynote about the roadmap of JEE 7. Summarizing we can expect some smooth refinements of the platform (exept maybe the support for virtualization out of the box). Here are our impression on the talks of Thursday. Please expect our summary blog post on monday since we are all now in rush to get out things done and to catch train, plain etc. We hope, you enjoyed the blog flood so far ;-)

# Designing Java Systems to Operate at a Cloud Scale (George Reese)

The talk focused on how to architect cloud applications in
general. The main tips given were:

Follow best practices when developing applications:
- No hard-coding of IP Addresses
- Try to avoid keeping state on server (fail over, scalability, performace)
- Use caching where appropriate (static content, db)
Choose the appropriate DB (NoSQL or relational db?)
- Don’t buy into the NoSQL hype
- Where integrity of data and transactional integrity matter - consider relational db
- This is basically a tradeoff between scalability and transactional integrity
  - DBs can scale, but is complex
  - NoSQL can also support data integrity
Scaling
- Split DB reads from writes: typically there are much more reads than writes
- Consider a sharding strtegy early on
- Take advantage of CDN
Self-contained Applications
- In a cloud environment you need to be able to replicate application very quickly (when scaling / disaster recovery)
Network Assumptions
- Avoid assumptions about network topologies
- Keep the network architecture SIMPLE
- Some EJB servers require broadcast/multicast for clustering.. but this is not always supported in all clouds
Disk and Network I/O
- You need to worry about them in the cloud - especially in conjunction with virtualisation.
- Heavy I/O applications or chaty network applictions have to worry
  in particular. It doesn’t mean that they are not suited for the
  cloud - there are work arounds
Assume cloud is a hostile environment
- Trust no-one is key
- Handle passwords very carefully
- encrypt network traffic
- and most important: install an IDS (intrusion detection system)
Design for failure
- Build redundant components

When it comes to Java Applications there are a few pointers to consider

Use message queues for communication, when possible. Avoid RMI(to keep network topology simple) and SOAP (due to bloatedness).
EJBs should be avoided, due to the network problems in clustered environment. Also ejbs are more resource intensive
Consider also using multi-processes and not just multi-threads. Threads cannot be split up across multiple JVMs, processes can.

My overall impression of the session was that it was very high-level -
good if you haven’t had much exposure to cloud applications. It
touched all the main topics, but unfortunately didn’t delve into any
of the details. A lot of the topics covered were self evident.

Hadoop and NoSQL at Twitter (Dmitriy Ryaboy)

Actually, this talk was not really about Hadoop, but about scaling
large data sets at Twitter.
There are lot of different kind of scale problems, but there are
general principles which can be applied to solving yours. And there is
a good chance something already solved your problem. Twitter has to deal with
95 Million tweets per day, 3000 Tweets per second.

Single master with many read slaves doesn’t work here because of write
speed bootlenecks and it does not play well with multiple data center.
Snowflake, the standalone distributed UID generator Twitter is using, is time-dominant, which
means data is roughly time sorted.

Gizzard is Twitters sharding framework
which key features are spreading the keyspace across many nodes and
replication. Messages are mapped to shards and shards are mapped to
replicaton trees. Shards are abstracted (MySQL, Lucene, Redis, Logical
Shards). Ranges of keys are mapped to shards. Replication is
controlled by various possible replication policies. Fault tolerance
is realized by re-enqueing failed writes, but writes must be
commutative and idempotent. Stale reads can happen (CALM:
Consistency As Logical Monotonicity)

Haplocheirus is a vector
cache. 1.2 Million deliveris per second of posts, which all would have
to be queried for. Assembling the timeline is expensive if an
“assemble on read” is used. “Assemble on write” has high storage costs
and is expensive for popular users. The latter can be fixed by async
writes. For this, a LRU cache is used, which is currently Memcache. In
the future Twitter will use Haplo, a redis-based timeline store. The
conclusion is to use precomputing wisely.

FlockDB is a social graph
store. It is realized by several tables for holding relations, which
is partioned by user id. It is Twitter’s current solution for holding
user relationships and calculating intersections.

Cassandra is used by Twitter for large
scale data mining, a geo database and realtime analytics. Lucene is
used for searches on the geo database.

Rainbird, part of Cassandra, is used for time series analytics.

Cuckoo is used for cluster monitoring (not opensource yet).

Hadoop is used for offline processing at
Twitter. 1000 machines, Billions of API requests, 12 TB of ingested
data, 95 Million Tweets per day generate huge amount of datas, for
whicht a OLAP database it not a good fit. Hadoop scales to good to
large data sizes, but it is slower than a speciliast OLAP DB. Twitter
uses a hybrid approach with Vertica used for table aggregations,
Hadoop for logs etc. Scribe (originating from Facebook) is used for
logging. Hadoop gets 12 TB per day data.

Elephant-Bird is a
library for working with data in Hadoop. Thrift, Avro and Protocol
Buffers are serialization frameworks, which give a compact description
of data and are backwards compatible. Very useful for logging data for
later data analysis. Elephant-Bird uses Protocol Buffers for dealing
with Hadoop I/O Format.

HBase and Pig (a declarative dataflow language) are used for
analytics within Twitter. Howl is
an abstraction to seamlessly work with Pig and Hive.

Recommendations:

Precompute results if query space is limited
Provide narrow query interfaces. Optimize them.
Staying CALM for eventual consistency.
Sharding and replication is a pattern (use a framework).
Use existing tools.

Wow, what a firework of tools, I even didn’t heard about. ‘guess there
is a quite a lot to catch up in order to follow the latest data
modeling trends. Good talk, probably a bit to much new stuff for me.

Activiti (Tom Baeyens, Joram Barrez)

Activiti is a new BPM project lead by the former jBpm Head Tom Bayens
under the umbrella of Alfresco. It is licensed under the Apache
License as is a BPMN 2.0 engine. Activiti can be embedded in any Java
environment and is extensible. One of the technical advantages of
Activiti compared to jBpm is its Spring support from the very
beginning. Quite a bunch of tool surround Activiti:

Webbased BPMN 2.0 graphical edior
Activiti Explorer for task management
Activiti Probe for administrative functionality
Activiti Cycle is BPM collaboration
REST-Api
Activiti Eclipse designer (including BPMN 2.0 validation)
Activiti Grails integration

An example of a simple BPMN 2.0 notation used by Activiti looks like:

<?xml version="1.0" encoding="UTF-8"?>

<definitions id="definitions"
xmlns="http://www.omg.org/spec/BPMN/20100524/MODEL"
targetNamespace="http://www.activiti.org/bpmn2.0">

  <process id="helloWorld">

    <startEvent id="start" />
    <sequenceFlow id="flow1" sourceRef="start" targetRef="script" />
    <scriptTask id="script" name="HelloWorld" scriptFormat="groovy">
      <script>
        System.out.println("Hello world")
      </script>
    </scriptTask>
    <sequenceFlow id="flow2" sourceRef="script" targetRef="theEnd" />
    <endEvent id="theEnd" />

</process>

</definitions>

This is how Activity uses this process:

// Bootstrap
ProcessEngine processEngine = new DbProcessEngineBuilder()
  .configureFromPropertiesResource("activiti.properties")
  .buildProcessEngine();
ProcessService processService = processEngine.getProcessService();

// Deployment
processService.createDeployment()
  .addClasspathResource("hello-world.bpmn20.xml")
  .deploy();

// Run
processService.startProcessInstanceByKey("helloWorld");

Some sort of real world example (obtaining a loan from a bank) was
inroduced and clicked through. It include integration with Alfresco,
where document where created and managed. Excel integration is there
as well.

Activiti has nice support for JUnit Test for creating unit testing your
processes using custom annotations. The query API for queryin process
instances.

In a 1-minute crash movie, Joram demonstrate how easy it is to setup
Activiti with a default setup along with all those nice tools.

It is really impressive what Activiti achieved in these few months of
its existance. I’m pretty sure, that Activiti is (or become) the king
of open source BPM, and maybe beyond. Activiti is definetly worth a
try.

BTW, I bever seen a speaker (Joram Barrez) overtaking himself while
speaking that fast ;-)

Akka (Viktor Klang)

The speaker started his session by mentioning that he has to recover
from 9 years of Java development which made me crack up a bit :-) Akka
is technology which is both written in Scala and in Java.

He continnued listing all the vision stuff that it is simple to write
concurrent, fault-tolerant and scalable applications using Akka.

Here is the overview he presented:

Simpler Concurrency
Event-driven architecture
true scalability
fault tolerance
transparent remoting
java & scala api

Akka is all about the Java-Actor-Implementation from its programming
conception and seems indeed very easy to be used.

Here is an example in Scala which I copied from http://akkasource.org:

// server code
class HelloWorldActor extends Actor {
 def receive = {
   case msg => self reply (msg + " World")
 }
}
RemoteNode.start("localhost", 9999).register(
 "hello-service", actorOf[HelloWorldActor])

// client code
val actor = RemoteClient.actorFor(
 "hello-service", "localhost", 9999)
val result = actor !! "Hello"

Note that the !! (bang bangs) are an operator overload. In Java this
method means “sendRequestReply”.

A test project using Akka is online

Google Web Toolkit (David Geary)

David is obviously an expert on Java and user interfaces. He wrote
impressivly many books about Swing, JavaServer Faces (JSF), Advanced
JSP, the JSP Standard Tag Library, and the Google Web Toolkit.

His demo was quite enjoyable. He (re)coded on the fly a nice little
web app called “Places” and containing content from Yahoo!Maps not
without some errors in Eclipse. His comment on that was: “That’s why
when I’m at home I pay for IntelliJ.”

I found the slides for his demo also here

There is also a Quake demo on YouTube. Quake is running inside of a
browser. This program was made with GWT.

David came up with some news about some features in GWT 2.0:

Just released (28/10/2010)
There is no fake browser any more as in version 1.0 instead they have hosted mode browser plugins
(I think for Firefox, Safari and IE)
Layouting is completely new. It is very similar to Swing now (remember GridbagLayout and so on)
Event Listeners (also similar to Swing/AWT/SWT) but there is no need for Adapters anymore since
there EventHandlers now
“History” is also a nice feature. Using this you can browse through your web states by clicking
forward or backward in your browser as described in GOF Memento pattern
UIBuilder: widgets can now be declared in (XML because people complained about too much Java code)
and can be accessed via Java annotations
Monitoring with Speed Tracer looks quite comfortable (is for any webapp not only GWT)

I think it’s fun to play a little bit around with that technology and maybe use it with my own
programs.

Android UI Development: Tips, Tricks and Techniques (Romain Guy, Chet Haase)

Let’s talk about garbage! On mobile devices garbage matters! Garbage
generated with an animation on your mobile device which is generated
every time the animation is running can cause serious problems. So
keep in mind to keep garbage at a minimum level when dealing with
mobile devices just like you would do in normal life :) Chet Haase and
Romain Guy talk about tips and tools pointing to performance and
memory leaks on mobile devices.

Temporaries Sometimes it is necessary to have temporary objects such as local
variables, but you should always consider to use a static final class
member instead of this.
Autoboxing creates objects! If you do not need an object type use
primitive types instead so allocation is minimized.
Iterator. Enhanced for() loops are great but they create garbage!
This is because it instantiates a new iterator. What can we do
about this? Consider to a size check first before the enhanced
for() loop and you will prevent empty Iterators generated.

if (nodeList.size() > 0) {
   for(Node node : nodeList) {
       //do something
   }
}

Image recycling. Recycle the bitmaps on mobil devices. Bitmaps are
finalized and finalizers may clear the data … eventually … some
time. Even null setting does not help here (myBitmap = null). What
you want to do is (myBitmap.recycle()). Do not wait for the
finalizer to do the work if you need that memory now. References may
be all gone but memory is not free for new allocation when really
you need it.
Varargs Variable arguments on methods are packaged into a
temporary array. Be aware of that and double check if you really
need the variable arguments.
Generics Generics only deal with objects. Primitive types get
autoboxed generating memory allocation.
Do we really need that type parameter in MyClass? Consider other
ways than generics on mobile devices.

MyClass<Float> myObject = new MyClass<Float>();

Tools and Demos. The rest of the talk showed some tools for
finding and checking performance issues and memory allocations. The
speakers gave some demos on several very useful tools. The two most
exciting tools are:
- DDMS: Allocation tracking and limiting the allocation limit. Count
  the allocations being made. DDMS comes as standalone version or
  Eclipse plugin.
- hat (Heap Analysis Tool): track down memory leaks
  The demo on heap size analysis and memory leak detection showed how
  bitmap drawables keep a backward reference to the view port in order
  to be able to refresh the view. This reference causes unnecessary
  allocations when caching those bitmap drawables in static fields.
  Following from that
  - be careful with the context
  - be careful with static fields
  - avoid non-static inner classes
  - use weak references
Responsiveness. Single-threaded UI on mobile devices require
special care! if you block the UI thread you block the user
interaction. Instead use async tasks with messaging or
handlers with messaging.
Overinvalidation Render things when things are important to be
rendered. Do not draw more than you really should. Custom components
need to take care of invalidation. This point was followed with a
very impressive demo on message profiling. Trace view tracking shows
exactly what is going on on the canvas and points to components that
are unnecessary rendered all the time. This was caused by wrong
invalidation. The solution is simple: Just invalidate the sections
you need to refresh.

The modular Java Platform (Mark Reinhold)

Mark Reinhold is Chief Architect of the Java Platform Group at Oracle,
where he works on the Java Platform, Standard Edition, and OpenJDK.

This session is about Java 7 or later handles application
construction, packaging and publication. In other words how to get
rid of the JAR hell which we have now?

Mark explained that in the Jigsaw they already have resolved a lot
problems. These solution will come along with Java 7.

The main solution is: The Modular Java Platform

which enables escape from JAR hell by:

eliminate the classpath
record dependencies directly in source code
package modules for automatic download & install
easily generate sensible rpn/dev/svr/ips packages

The Module system requirements are:

fast class loading
- during startup and throughout runtime
- on all types of devices
- current class-path mechanism is too slow
Predictability
package subset: cannot (massively) refactor the existing SE API set
Substitutability: to support refactoring modules over time
Optionality
- a method can depend upon an optional module
- presence/absence of optional module detected at install
- method handles presence/absence at runtime
Self-applicability

Here are some examples how the modules can be declared:

Grouping Example

//module-info.java
module com.foo {
    class com.foo.Main;
    ...
}

//module-info.java
module com.foo {
    requires org.bar.lib;
    requires org.baz.lib;
}

Versioning Example

//module-info.java

module com.foo @ 1.0.0 {
    requires org.bar.lib @ 2.1-alpha;
    requires org.baz.lib @ 2.0;
}

Encapsulation Example

//module-info.java

module com.foo  @ 3 {
    permits org.bar.lib;
}

Optional Modules Example

//module-info.java

module com.foo {
    requires org.bar.lib;
    requires optional com.foo.extra;
}

Here is an example for how a module is packaged

$ javac -modulepath mods src/com.foo.app/...
$ ls mods
com.foo.app/
com.foo.extra/
com.foo.lib/

And this is an example how a module can be packaged for Debian

$ jpkg -m mods deb com.foo.app com.foo.lib

Getting things done for programmers (Kito Mann)

Kito Mann is the author of “Java Server Faces in Action” and he runs
the http://jsfcentral.com website.

The whole talk is based on the book “Getting Things Done” by David
Allen

The talk begins with a description of a programmers daily life beeing
bombarded with eMails, tweets, phone calls, meetings.

All of this results in too many things to do - sounds familiar to me.
He uses the picture of unclosed loops for that and

He describes the goal of “GTD” as to close those loops to avoid
constant thinking about them leaving more energy to get things done.

GTD works like this:

collect things in an inbox by writing them down on
process your inbox with the goal of a zero inbox - put them into trash, make a task or project out of it
make tasks out of projects which you can complete - take hours or minutes
defer, delegate, delete or process tasks
put tasks in contexts such as cellphone, internet, office, whatever
filter what you can currently do by contexts
it might be usefull to use project ids such as “presentations.gtd.programmers”
do a weekly review of your tasks. the outcome might be more tasks, task updates or deleting tasks
select the tasks you do based on their context, the time available, your energy level and by priority
focus on tasks: work on one task at a time, avoid distraction, box your time: pomodoro technique

Then Mr Mann started talking about tools which can be used to do GTD.
I left at this point …

The talk was very good over all and worth attending. I think “Getting
Things Done” has some interesting ideas in it but is too much of a
process for me. It seems very restrictive and not flexible enough. But
I will definitely try out closing my mail client from time to time to
get things done…

Author: Roland Huß

Tags: cloud, devoxx, Java, scala, conference

Categories: devoxx, java, development