The last full conference day of the Devoxx was again packed full with very interesting talks of various kind. It started with a keynote about the roadmap of JEE 7. Summarizing we can expect some smooth refinements of the platform (exept maybe the support for virtualization out of the box). Here are our impression on the talks of Thursday. Please expect our summary blog post on monday since we are all now in rush to get out things done and to catch train, plain etc. We hope, you enjoyed the blog flood so far ;-)
# Designing Java Systems to Operate at a Cloud Scale (George Reese)
The talk focused on how to architect cloud applications in
 general. The main tips given were:
When it comes to Java Applications there are a few pointers to consider
My overall impression of the session was that it was very high-level -
 good if you haven’t had much exposure to cloud applications. It
 touched all the main topics, but unfortunately didn’t delve into any
 of the details. A lot of the topics covered were self evident.
Actually, this talk was not really about Hadoop, but about scaling
 large data sets at Twitter.
 There are lot of different kind of scale problems, but there are
 general principles which can be applied to solving yours. And there is
 a good chance something already solved your problem. Twitter has to deal with
 95 Million tweets per day, 3000 Tweets per second.
Single master with many read slaves doesn’t work here because of write
 speed bootlenecks and it does not play well with multiple data center.
 Snowflake, the standalone distributed UID generator Twitter is using, is time-dominant, which
 means data is roughly time sorted.
Gizzard is Twitters sharding framework
 which key features are spreading the keyspace across many nodes and
 replication. Messages are mapped to shards and shards are mapped to
 replicaton trees. Shards are abstracted (MySQL, Lucene, Redis, Logical
 Shards). Ranges of keys are mapped to shards. Replication is
 controlled by various possible replication policies. Fault tolerance
 is realized by re-enqueing failed writes, but writes must be
 commutative and idempotent. Stale reads can happen (CALM:
 Consistency As Logical Monotonicity)
Haplocheirus is a vector
 cache. 1.2 Million deliveris per second of posts, which all would have
 to be queried for. Assembling the timeline is expensive if an
 “assemble on read” is used. “Assemble on write” has high storage costs
 and is expensive for popular users. The latter can be fixed by async
 writes. For this, a LRU cache is used, which is currently Memcache. In
 the future Twitter will use Haplo, a redis-based timeline store. The
 conclusion is to use precomputing wisely.
FlockDB is a social graph
 store. It is realized by several tables for holding relations, which
 is partioned by user id. It is Twitter’s current solution for holding
 user relationships and calculating intersections.
Cassandra is used by Twitter for large
 scale data mining, a geo database and realtime analytics. Lucene is
 used for searches on the geo database.
Rainbird, part of Cassandra, is used for time series analytics.
Cuckoo is used for cluster monitoring (not opensource yet).
Hadoop is used for offline processing at
 Twitter. 1000 machines, Billions of API requests, 12 TB of ingested
 data, 95 Million Tweets per day generate huge amount of datas, for
 whicht a OLAP database it not a good fit. Hadoop scales to good to
 large data sizes, but it is slower than a speciliast OLAP DB. Twitter
 uses a hybrid approach with Vertica used for table aggregations,
 Hadoop for logs etc. Scribe (originating from Facebook) is used for
 logging. Hadoop gets 12 TB per day data.
Elephant-Bird is a
 library for working with data in Hadoop. Thrift, Avro and Protocol
 Buffers are serialization frameworks, which give a compact description
 of data and are backwards compatible. Very useful for logging data for
 later data analysis. Elephant-Bird uses Protocol Buffers for dealing
 with Hadoop I/O Format.
HBase and Pig (a declarative dataflow language) are used for
 analytics within Twitter. Howl is
 an abstraction to seamlessly work with Pig and Hive.
Recommendations:
Wow, what a firework of tools, I even didn’t heard about. ‘guess there
 is a quite a lot to catch up in order to follow the latest data
 modeling trends. Good talk, probably a bit to much new stuff for me.
Activiti is a new BPM project lead by the former jBpm Head Tom Bayens
 under the umbrella of Alfresco. It is licensed under the Apache
 License as is a BPMN 2.0 engine. Activiti can be embedded in any Java
 environment and is extensible. One of the technical advantages of
 Activiti compared to jBpm is its Spring support from the very
 beginning. Quite a bunch of tool surround Activiti:
An example of a simple BPMN 2.0 notation used by Activiti looks like:
<?xml version="1.0" encoding="UTF-8"?>
<definitions id="definitions"
xmlns="http://www.omg.org/spec/BPMN/20100524/MODEL"
targetNamespace="http://www.activiti.org/bpmn2.0">
  <process id="helloWorld">
    <startEvent id="start" />
    <sequenceFlow id="flow1" sourceRef="start" targetRef="script" />
    <scriptTask id="script" name="HelloWorld" scriptFormat="groovy">
      <script>
        System.out.println("Hello world")
      </script>
    </scriptTask>
    <sequenceFlow id="flow2" sourceRef="script" targetRef="theEnd" />
    <endEvent id="theEnd" />
</process>
</definitions>This is how Activity uses this process:
// Bootstrap
ProcessEngine processEngine = new DbProcessEngineBuilder()
  .configureFromPropertiesResource("activiti.properties")
  .buildProcessEngine();
ProcessService processService = processEngine.getProcessService();
// Deployment
processService.createDeployment()
  .addClasspathResource("hello-world.bpmn20.xml")
  .deploy();
// Run
processService.startProcessInstanceByKey("helloWorld");Some sort of real world example (obtaining a loan from a bank) was
 inroduced and clicked through. It include integration with Alfresco,
 where document where created and managed. Excel integration is there
 as well.
Activiti has nice support for JUnit Test for creating unit testing your
 processes using custom annotations. The query API for queryin process
 instances.
In a 1-minute crash movie, Joram demonstrate how easy it is to setup
 Activiti with a default setup along with all those nice tools.
It is really impressive what Activiti achieved in these few months of
 its existance. I’m pretty sure, that Activiti is (or become) the king
 of open source BPM, and maybe beyond. Activiti is definetly worth a
 try.
BTW, I bever seen a speaker (Joram Barrez) overtaking himself while
 speaking that fast ;-)
The speaker started his session by mentioning that he has to recover
 from 9 years of Java development which made me crack up a bit :-) Akka
 is technology which is both written in Scala and in Java.
He continnued listing all the vision stuff that it is simple to write
 concurrent, fault-tolerant and scalable applications using Akka.
Here is the overview he presented:
Akka is all about the Java-Actor-Implementation from its programming
 conception and seems indeed very easy to be used.
Here is an example in Scala which I copied from http://akkasource.org:
// server code
class HelloWorldActor extends Actor {
 def receive = {
   case msg => self reply (msg + " World")
 }
}
RemoteNode.start("localhost", 9999).register(
 "hello-service", actorOf[HelloWorldActor])
// client code
val actor = RemoteClient.actorFor(
 "hello-service", "localhost", 9999)
val result = actor !! "Hello"Note that the !! (bang bangs) are an operator overload. In Java this
 method means “sendRequestReply”.
A test project using Akka is online
Other topics the speaker mentioned:
Transactions demarcation is very nice by the way if you use Scala:
atomic {
    ...
    atomic {
         // transactions compose!!!
    }
}If you want to learn more about Akka go to this link.
 By the way it is OpenSource.
The technology seems to be quite cool and deserved propably a better speaker :-(
David is obviously an expert on Java and user interfaces. He wrote
 impressivly many books about Swing, JavaServer Faces (JSF), Advanced
 JSP, the JSP Standard Tag Library, and the Google Web Toolkit.
His demo was quite enjoyable. He (re)coded on the fly a nice little
 web app called “Places” and containing content from Yahoo!Maps not
 without some errors in Eclipse. His comment on that was: “That’s why
 when I’m at home I pay for IntelliJ.”
I found the slides for his demo also here
There is also a Quake demo on YouTube. Quake is running inside of a
 browser. This program was made with GWT.
David came up with some news about some features in GWT 2.0:
I think it’s fun to play a little bit around with that technology and maybe use it with my own
 programs.
Let’s talk about garbage! On mobile devices garbage matters! Garbage
 generated with an animation on your mobile device which is generated
 every time the animation is running can cause serious problems. So
 keep in mind to keep garbage at a minimum level when dealing with
 mobile devices just like you would do in normal life :) Chet Haase and
 Romain Guy talk about tips and tools pointing to performance and
 memory leaks on mobile devices.
Autoboxing creates objects! If you do not need an object type use
 primitive types instead so allocation is minimized.
if (nodeList.size() > 0) {
   for(Node node : nodeList) {
       //do something
   }
}myBitmap = null). WhatmyBitmap.recycle()). Do not wait for theMyClass<Float> myObject = new MyClass<Float>();Responsiveness. Single-threaded UI on mobile devices require
 special care! if you block the UI thread you block the user
 interaction. Instead use async tasks with messaging or
 handlers with messaging.
Mark Reinhold is Chief Architect of the Java Platform Group at Oracle,
 where he works on the Java Platform, Standard Edition, and OpenJDK.
This session is about Java 7 or later handles application
 construction, packaging and publication. In other words how to get
 rid of the JAR hell which we have now?
Mark explained that in the Jigsaw they already have resolved a lot
 problems. These solution will come along with Java 7.
The main solution is: The Modular Java Platform
which enables escape from JAR hell by:
The Module system requirements are:
Here are some examples how the modules can be declared:
//module-info.java
module com.foo {
    class com.foo.Main;
    ...
}
//module-info.java
module com.foo {
    requires org.bar.lib;
    requires org.baz.lib;
}//module-info.java
module com.foo @ 1.0.0 {
    requires org.bar.lib @ 2.1-alpha;
    requires org.baz.lib @ 2.0;
}//module-info.java
module com.foo  @ 3 {
    permits org.bar.lib;
}//module-info.java
module com.foo {
    requires org.bar.lib;
    requires optional com.foo.extra;
}$ javac -modulepath mods src/com.foo.app/...
$ ls mods
com.foo.app/
com.foo.extra/
com.foo.lib/$ jpkg -m mods deb com.foo.app com.foo.libKito Mann is the author of “Java Server Faces in Action” and he runs
 the http://jsfcentral.com website.
The whole talk is based on the book “Getting Things Done” by David
 Allen
The talk begins with a description of a programmers daily life beeing
 bombarded with eMails, tweets, phone calls, meetings.
All of this results in too many things to do - sounds familiar to me.
 He uses the picture of unclosed loops for that and
He describes the goal of “GTD” as to close those loops to avoid
 constant thinking about them leaving more energy to get things done.
GTD works like this:
Then Mr Mann started talking about tools which can be used to do GTD.
 I left at this point …
The talk was very good over all and worth attending. I think “Getting
 Things Done” has some interesting ideas in it but is too much of a
 process for me. It seems very restrictive and not flexible enough. But
 I will definitely try out closing my mail client from time to time to
 get things done…