The last full conference day of the Devoxx was again packed full with very interesting talks of various kind. It started with a keynote about the roadmap of JEE 7. Summarizing we can expect some smooth refinements of the platform (exept maybe the support for virtualization out of the box). Here are our impression on the talks of Thursday. Please expect our summary blog post on monday since we are all now in rush to get out things done and to catch train, plain etc. We hope, you enjoyed the blog flood so far ;-)
# Designing Java Systems to Operate at a Cloud Scale (George Reese)
The talk focused on how to architect cloud applications in
general. The main tips given were:
When it comes to Java Applications there are a few pointers to consider
My overall impression of the session was that it was very high-level -
good if you haven’t had much exposure to cloud applications. It
touched all the main topics, but unfortunately didn’t delve into any
of the details. A lot of the topics covered were self evident.
Actually, this talk was not really about Hadoop, but about scaling
large data sets at Twitter.
There are lot of different kind of scale problems, but there are
general principles which can be applied to solving yours. And there is
a good chance something already solved your problem. Twitter has to deal with
95 Million tweets per day, 3000 Tweets per second.
Single master with many read slaves doesn’t work here because of write
speed bootlenecks and it does not play well with multiple data center.
Snowflake, the standalone distributed UID generator Twitter is using, is time-dominant, which
means data is roughly time sorted.
Gizzard is Twitters sharding framework
which key features are spreading the keyspace across many nodes and
replication. Messages are mapped to shards and shards are mapped to
replicaton trees. Shards are abstracted (MySQL, Lucene, Redis, Logical
Shards). Ranges of keys are mapped to shards. Replication is
controlled by various possible replication policies. Fault tolerance
is realized by re-enqueing failed writes, but writes must be
commutative and idempotent. Stale reads can happen (CALM:
Consistency As Logical Monotonicity)
Haplocheirus is a vector
cache. 1.2 Million deliveris per second of posts, which all would have
to be queried for. Assembling the timeline is expensive if an
“assemble on read” is used. “Assemble on write” has high storage costs
and is expensive for popular users. The latter can be fixed by async
writes. For this, a LRU cache is used, which is currently Memcache. In
the future Twitter will use Haplo, a redis-based timeline store. The
conclusion is to use precomputing wisely.
FlockDB is a social graph
store. It is realized by several tables for holding relations, which
is partioned by user id. It is Twitter’s current solution for holding
user relationships and calculating intersections.
Cassandra is used by Twitter for large
scale data mining, a geo database and realtime analytics. Lucene is
used for searches on the geo database.
Rainbird, part of Cassandra, is used for time series analytics.
Cuckoo is used for cluster monitoring (not opensource yet).
Hadoop is used for offline processing at
Twitter. 1000 machines, Billions of API requests, 12 TB of ingested
data, 95 Million Tweets per day generate huge amount of datas, for
whicht a OLAP database it not a good fit. Hadoop scales to good to
large data sizes, but it is slower than a speciliast OLAP DB. Twitter
uses a hybrid approach with Vertica used for table aggregations,
Hadoop for logs etc. Scribe (originating from Facebook) is used for
logging. Hadoop gets 12 TB per day data.
Elephant-Bird is a
library for working with data in Hadoop. Thrift, Avro and Protocol
Buffers are serialization frameworks, which give a compact description
of data and are backwards compatible. Very useful for logging data for
later data analysis. Elephant-Bird uses Protocol Buffers for dealing
with Hadoop I/O Format.
HBase and Pig (a declarative dataflow language) are used for
analytics within Twitter. Howl is
an abstraction to seamlessly work with Pig and Hive.
Recommendations:
Wow, what a firework of tools, I even didn’t heard about. ‘guess there
is a quite a lot to catch up in order to follow the latest data
modeling trends. Good talk, probably a bit to much new stuff for me.
Activiti is a new BPM project lead by the former jBpm Head Tom Bayens
under the umbrella of Alfresco. It is licensed under the Apache
License as is a BPMN 2.0 engine. Activiti can be embedded in any Java
environment and is extensible. One of the technical advantages of
Activiti compared to jBpm is its Spring support from the very
beginning. Quite a bunch of tool surround Activiti:
An example of a simple BPMN 2.0 notation used by Activiti looks like:
<?xml version="1.0" encoding="UTF-8"?>
<definitions id="definitions"
xmlns="http://www.omg.org/spec/BPMN/20100524/MODEL"
targetNamespace="http://www.activiti.org/bpmn2.0">
<process id="helloWorld">
<startEvent id="start" />
<sequenceFlow id="flow1" sourceRef="start" targetRef="script" />
<scriptTask id="script" name="HelloWorld" scriptFormat="groovy">
<script>
System.out.println("Hello world")
</script>
</scriptTask>
<sequenceFlow id="flow2" sourceRef="script" targetRef="theEnd" />
<endEvent id="theEnd" />
</process>
</definitions>
This is how Activity uses this process:
// Bootstrap
ProcessEngine processEngine = new DbProcessEngineBuilder()
.configureFromPropertiesResource("activiti.properties")
.buildProcessEngine();
ProcessService processService = processEngine.getProcessService();
// Deployment
processService.createDeployment()
.addClasspathResource("hello-world.bpmn20.xml")
.deploy();
// Run
processService.startProcessInstanceByKey("helloWorld");
Some sort of real world example (obtaining a loan from a bank) was
inroduced and clicked through. It include integration with Alfresco,
where document where created and managed. Excel integration is there
as well.
Activiti has nice support for JUnit Test for creating unit testing your
processes using custom annotations. The query API for queryin process
instances.
In a 1-minute crash movie, Joram demonstrate how easy it is to setup
Activiti with a default setup along with all those nice tools.
It is really impressive what Activiti achieved in these few months of
its existance. I’m pretty sure, that Activiti is (or become) the king
of open source BPM, and maybe beyond. Activiti is definetly worth a
try.
BTW, I bever seen a speaker (Joram Barrez) overtaking himself while
speaking that fast ;-)
The speaker started his session by mentioning that he has to recover
from 9 years of Java development which made me crack up a bit :-) Akka
is technology which is both written in Scala and in Java.
He continnued listing all the vision stuff that it is simple to write
concurrent, fault-tolerant and scalable applications using Akka.
Here is the overview he presented:
Akka is all about the Java-Actor-Implementation from its programming
conception and seems indeed very easy to be used.
Here is an example in Scala which I copied from http://akkasource.org:
// server code
class HelloWorldActor extends Actor {
def receive = {
case msg => self reply (msg + " World")
}
}
RemoteNode.start("localhost", 9999).register(
"hello-service", actorOf[HelloWorldActor])
// client code
val actor = RemoteClient.actorFor(
"hello-service", "localhost", 9999)
val result = actor !! "Hello"
Note that the !! (bang bangs) are an operator overload. In Java this
method means “sendRequestReply”.
A test project using Akka is online
Other topics the speaker mentioned:
Transactions demarcation is very nice by the way if you use Scala:
atomic {
...
atomic {
// transactions compose!!!
}
}
If you want to learn more about Akka go to this link.
By the way it is OpenSource.
The technology seems to be quite cool and deserved propably a better speaker :-(
David is obviously an expert on Java and user interfaces. He wrote
impressivly many books about Swing, JavaServer Faces (JSF), Advanced
JSP, the JSP Standard Tag Library, and the Google Web Toolkit.
His demo was quite enjoyable. He (re)coded on the fly a nice little
web app called “Places” and containing content from Yahoo!Maps not
without some errors in Eclipse. His comment on that was: “That’s why
when I’m at home I pay for IntelliJ.”
I found the slides for his demo also here
There is also a Quake demo on YouTube. Quake is running inside of a
browser. This program was made with GWT.
David came up with some news about some features in GWT 2.0:
I think it’s fun to play a little bit around with that technology and maybe use it with my own
programs.
Let’s talk about garbage! On mobile devices garbage matters! Garbage
generated with an animation on your mobile device which is generated
every time the animation is running can cause serious problems. So
keep in mind to keep garbage at a minimum level when dealing with
mobile devices just like you would do in normal life :) Chet Haase and
Romain Guy talk about tips and tools pointing to performance and
memory leaks on mobile devices.
Autoboxing creates objects! If you do not need an object type use
primitive types instead so allocation is minimized.
if (nodeList.size() > 0) {
for(Node node : nodeList) {
//do something
}
}
myBitmap = null
). WhatmyBitmap.recycle()
). Do not wait for theMyClass<Float> myObject = new MyClass<Float>();
Responsiveness. Single-threaded UI on mobile devices require
special care! if you block the UI thread you block the user
interaction. Instead use async tasks with messaging or
handlers with messaging.
Mark Reinhold is Chief Architect of the Java Platform Group at Oracle,
where he works on the Java Platform, Standard Edition, and OpenJDK.
This session is about Java 7 or later handles application
construction, packaging and publication. In other words how to get
rid of the JAR hell which we have now?
Mark explained that in the Jigsaw they already have resolved a lot
problems. These solution will come along with Java 7.
The main solution is: The Modular Java Platform
which enables escape from JAR hell by:
The Module system requirements are:
Here are some examples how the modules can be declared:
//module-info.java
module com.foo {
class com.foo.Main;
...
}
//module-info.java
module com.foo {
requires org.bar.lib;
requires org.baz.lib;
}
//module-info.java
module com.foo @ 1.0.0 {
requires org.bar.lib @ 2.1-alpha;
requires org.baz.lib @ 2.0;
}
//module-info.java
module com.foo @ 3 {
permits org.bar.lib;
}
//module-info.java
module com.foo {
requires org.bar.lib;
requires optional com.foo.extra;
}
$ javac -modulepath mods src/com.foo.app/...
$ ls mods
com.foo.app/
com.foo.extra/
com.foo.lib/
$ jpkg -m mods deb com.foo.app com.foo.lib
Kito Mann is the author of “Java Server Faces in Action” and he runs
the http://jsfcentral.com website.
The whole talk is based on the book “Getting Things Done” by David
Allen
The talk begins with a description of a programmers daily life beeing
bombarded with eMails, tweets, phone calls, meetings.
All of this results in too many things to do - sounds familiar to me.
He uses the picture of unclosed loops for that and
He describes the goal of “GTD” as to close those loops to avoid
constant thinking about them leaving more energy to get things done.
GTD works like this:
Then Mr Mann started talking about tools which can be used to do GTD.
I left at this point …
The talk was very good over all and worth attending. I think “Getting
Things Done” has some interesting ideas in it but is too much of a
process for me. It seems very restrictive and not flexible enough. But
I will definitely try out closing my mail client from time to time to
get things done…