Java, Javascript, Python, Linux, Node.js.... @gokulvanan - Tumblr Blog

Talk on how we run Flash Sales at Flipkart

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

Talk on Hbase Customizations done in Scaling OLTP systems.

#hbase #oltp #scale #datastore

JVM Memory Model and Garbage Collection

A well know aspect of running code on languages which use the JVM is garbage collection. The mechanism of running a background thread to release unused memory and avoid fragmentation.

In Java programs the memory management aspect is abstracted by the JVM. Once we start a jvm process memory is allocated by the kernel using the virtual memory address that map to User space memory. (Note: I am not sure how much is allocated at start and if this is fixed or increased on demand, will update the same in future edits).

Once memory is allocated to JVM process. Its up to JVM to manage the memory blocks , i.e. allocate and reclaim unused memory.

In this blog I will try to describe the jvm memory model and how garbage collection works along with indicating various jvm parameters that could be used in configuring GC settings. I will also try to mention a few tools that will help in debugging the jvm in production.

Note: This is a WIP blog. I will update this as I learn more.

Java memory model:

The memory model in jvm can be broken into the following components:

MetaSpace

JIT Code cache

Thread Stacks

Heap

MetaSpace:

This region of memory was know as PermGen space in Jdk 7(Permanent Generation space as it would never need any garbage collected).

This is the memory space used to store the loaded class files by jvm class loaders. Since a typical java application tends to link to various libraries, all these would need to be loaded into meta space. This is designed to grow based on need.

But issues would arise if java class loaders end up loading more classes than the physical RAM space. As Linux assigns only virtual memory address to a process, if swap is enabled in kernel options. Some of the memory data in the process with be moved to swap file. This will degrade performance of the java process running on the JVM.

Alternatively jvm-args can be specified to limit the size of meta space. This will restrict memory size and error out when we exceed the limit.

-XX:MetaspaceSize=128m // specifies 128Mb for metaspace

Note: usually in typical long running server kind of applications the count of class loaded would increase during bootstrap of server and then remain fairly static post that. Exceptional cases will involve loading classes at runtime over network etc which are infact dangerous in security considerations.

JIT CodeCache:

If you recall Java 101 basics. Java takes your sourceCode in *.java and converts it to byte code *.class. This is still not machine specific code. (Machine specific code would vary based on your OS). Byte code is run on the JVM. Hence JVM has a JIT (Just in time) compiler which converts the bytecode to machine specific code, now if jvm determines that a block of code is being frequently accessed it would cache the compiled native code, which will provide benefit of recompiling.

Basic JVM args in setting codeCacheSize: -XX:InitialCodeCacheSize=32m.

ThreadStacks:

Part of JVM memory used store ThreadStacks per thread running in the JVM.

More on this will be updated latter

Heap:

The major part of the jvm memory which is used to store objects.

So when you do:

String a = “test”;

You are creating a string object “test”.

Which internally is represented by String.class

Where “test” is inside a char[].

Size of charArray = 2N+ 24; // number of chars = 4 => 36 bytes;

Total size of the object “test” ⇒ 36 + 16 (object overhead) + 4 (hash) + 4(buffer) = 60 bytes. (The size is an approximate estimate - I could be wrong in the calculation of object overhead and array).

Now 60 bytes need to be allocated by jvm in order to process the line

“test”;

The JVM allocates 60 bytes in the Heap. Also another 8 bytes would be allocated for the pointer “String a” which stores address of object “test”.

Heap size in jvm can be controlled using -Xms and -Xmx (or: -XX:InitialHeapSize and -XX:MaxHeapSize)

Eg: java -Xms128m -Xmx2g MyApp

Note, this heap is cleaned up by garbage collector enabling us to keep creating new objects and not worry about destroying unused objects and ensuring we are below the MaxHeapSize configure else we would throw OutOfMemoryException.

The heap is broken down into 2 regions:

Young Gen

Old Gen (Tenured region)

YoungGen as the name implies is the region where all objects are created and short lived objects exist.

YoungGen is further broken into 2 components.

Eden

Survivor Spaces (S0 and S1)

Eden is where all objects are created first.

Garbage Collection:

There are 2 kinds of GC minor GC and major GC.

Minor GC:

Note some details given below could vary based on type of Garbage collector in use.

Minor GC runs when JVM is not able to get enough memory from Eden.

Minor GC checks for blocks of memory which are still in use, but running down the Thread stack memory and checking their pointers to memory location, (Note this implies threads need to be paused during this run so that they dont change. Hence minor GC does create a stop the world pause, which would be very small in duration).

Minor GC can be happening on single thread or multithread based on collector applied.

Once it identifies Unused memory spaces in Eden , it runs copy collection. To move all used data to Survivor space so as to clean up the space in Eden.

Survivor space comprises of 2 survivor which are equal in size.

At any given point time one Survivor is ToSpace and other is FromSpace.

So if S0 is ToSpace in the first Run and S1 is FromSpace and both are empty

In second run S1 is ToSpace which implies both data from Eden which are not collected plus old data in S0 which survived first GC are pushed to S1.

Note: If object is to big and can not be pushed in to S1 or S0 during either collection then it’s pushed in to Tenured region. (This is called premature promotion. One of the issues you can run into if you are not cognizant of your data size and have not allocated sufficient survivor space and eden space)

Post each collection object’s age is increment, hence a long living object will flip-flop from S0 to S1 regions at each minor GC.

If object stays longer than a configured number Tenure, it is copied over to OldGen/ Tenured region, by default this in 15.

We know that go configure heap space to be used we can use

-Xms512m -Xmx512m

Now to configure the portion of Heap to be used by Young Gen we can use

-XX:newRatio=2 -XX:survivorRatio=8

These options are not quite straightforward and need some explanation.

newRatio = 2 implies that out of total heap allocated we the ratio of OldGen / YounGen == newRatio.

Hence newRatio = 2 implies ⅓ of heap memory will be used for YounGen. In this case close to 171Mb will be used in YounGen and 341Mb for oldGen.

Now survivorRatio=8 implies that each survivor space with ⅛ of the eden. Two survivor space implies ⅛ + ⅛ = ¼

Hence eden would be ¾ of the youngGenSpace.

For the above example that would mean

Eden size would be ¾ * 171Mb = 128Mb aprox

Survivor space S0 and S1 will each be = 21Mb

Note: above options are relative modes of tuning the GC size, Alternatively we can specify specific size for YoungGeneneration which will override the newRatio using

XX:NewSize=171m -XX:MaxNewSize=171m

XX:MaxTenuringThreshold=15 // To change the tenuringThreshold on which object should move to tenured region which by default is 15.

Major GC:

If tenured region gets filled up then JVM will need to trigger Major GC or FullGC.

System.gc() and Runtime.getRuntime().gc() suggest JVM to initiate GC.

FullGC will remove unused objects in tenured region and will also try to reclaim space for MetaSpace. And for loaded classes which do not have any objects on heap, those classes can be removed.

If metaspace size threshold is provided. FullGC will get trigger to reclaim MetaSpace.

Jvm cmd line tools: (TODO will update this shortly)

jstack

jmap

jps

jcmd

jinfo

java -XX:+PrintFlagsFinal -XX:+UseG1GC -version

#java jvm

Idempotency in practice

One of the common issue in Web service is the problem of network timeout.

As a simple example consider a User registration scenario. User enters all his details and clicks Register button, only to see the web page not responding, He now clicks again and again for a few times and in between one of his click, he is redirected to the Welcome Page.

Looks all good but we may end up having multiple entries of the User in the backend.

There are two approaches to avoid this problem:

Disabling the button till callback of the first click arrives.

To have transactional access to datastore persistence and use the unique constraint violation to catch the duplicate call.

The first approach is a client centric approach relying that client ensure he calls only once, going to the extent of affecting User experience, Not to mention that backend service system has no defense if there is a loophole in the client code.

The second approach is more robust and forms the base of idempotency handling.

In the world of microservices, this issue is more prevalent, with multiple microservices handling a single user action. As an example consider the example of buying a product from an e-commerce website.

A single click checkout and but could involve invoking multiple services, such as service to get User address information, service to get serviceability/ availability of the product, service for getting and applying offers, service getting and invoking payment channels and service for successfully storing the order.

The above call flow is a high level description, In reality the call graph could get more complex it could have loops and is not always a DAG, and in these cases the need for idempotency is of a high importance.

Before rushing into implementing all your API’s to be idempotent, It's important to consider when idempotency is needed.

For services which provide serviceability and availability lookup - idempotency is of no use. But services recording order placed, payment request, idempotency is needed to avoid duplicate order/ payment from being processed.

A rule of thumb is if your api is making any Lookup or GET kinda calls, idempotency is not needed. It’s only needed when you're making a PUT/POST call involving create/update of a persistent entity.

Idempotency implementation: (Not as simple as it looks)

An important aspect overlooked while implementing idempotency is the need for transactionality in checking for idempotency.

Let’s consider a simple implementation

if(idempStore.contains(idemKey){

return idemStore.get(idemKey); // may choose to get or throw IdemAborted exception here

}else{

-- execute logic

}

This simple implementation has issues of race conditions. The case of 2 concurrent request threads running through the if stateMent around the same time and returning false resulting in multiple execution of logics in the else block.

A simple fix to the above logic is to use a mutex.

sync(idemStore){

if(idemStore.contains(idemKey){

return idemStore.get(idemKey); // may choose to get or throw IdemAborted exception here

}else{

-- execute logic

}

This would prevent race conditions but increase latency over lock contention.

A few improvements over the above approach would be do all logic outside the synchronized block and use synchronize only for updating the data in store. Hence contention would be limited to the store operation latency in your application.

if(idemStore.contains(idemKey){

return idemStore.get(idemKey); // may choose to get or throw IdemAborted exception here

}

-- execute logic

sync(idemStore){

if(idemStore.contains(idemKey){

return idemStore.get(idemKey); // may choose to get or throw IdemAborted exception here

}else{

store.udpate(data)

idemStore.udpate(idemKey);

}

Alternative approach to pessimistic locking would be optimistic locking. Compare and Swap strategy, i.e. if this available within your datastore, such as checkAndUpsert in hbase, putIfAbsent in redis. Here the idea is store the idempotency key along with the data and version with CAS update operation and in case of CAS failure where version doesn’t match input version you recheck for idempotency in the data from store and throw idempotencyAbort if the idempotency key exist in the store.

if(store.contains(idemKey){

return store.get(idemKey); // may choose to get or throw IdemAborted exception here

}

-- execute logic

Boolean success = store.checkAndUpsert(data,idemKey,version);

if(!success) return store.get(idemKey);

The last approach is what is primarily used in practice in distributed systems.As the first 2 approach require having a distributed lock when running multiple instance of stateless app services. Zookeeper, Hazelcast are tools that can be leveraged to build a distributed lock, But distributed locking at high scale proves very inefficient and its best to handle this at persistence layer, which if well distributed and well sharded will serialize at the right partition and make the problem as in memory mutex which is simpler.

#idempotency #distributed systems

Java Threads

I have been working for quite some time in Java but realized that I have never actually used or gone deep into the basic abstractions in Java Threads. With java.util.concurrent package present and providing easy to use abstractions over the fundamental thread model in Java, one doesn't really find the need to use a Thread class and understand its working model. But knowing what goes down underneath sure does give better insights when writing code.

This blog is to recall some of the basic concepts in Java Threads and explains a few higher level abstractions derived from here. Let's start with trying to understand what is a thread.

A Thread is light weight process created with its own copy of local context variables, memory, priority, threadStack size to run independently in the program.

The above might seem gibberish to a few, so let me explain the terms used above in simpler way:

light weight process -> Similar to unix process the fundamental difference being that each light weight process don't have independent memory , stack segments allocated by the kernel. This is kinda managed by jvm process. Each thread in JVM process is mapped to a light weight process to the kernel so that it can use kernel based scheduling of threads. (Note this is why multithreading behaviour is not standard in different platforms. They depend on how kernel schedules them based on platforms).

priority -> priority is a number from 1 to 10, used to tell thread scheduler importance of a thread and give it more priority when using the processor, more on this when we discuss on Thread scheduling

threadStack -> Each thread during execution of program has a stack to maintain partial computation results. The size of stack can be specified on thread creation as this is stack is specific to this thread. Note this specific to JVM and not OS.

Thread Types:

In Java there are 2 types of thread. User thread and daemon threads.

JVM is designed to keep running even if one User thread is alive. Daemon are background thread and JVM will quit even if they are running when no User threads are alive.

So when you do public static void main, JVM creates a whole lot of daemon thread for Garbage collection etc.. but one User Main thread to kick start main. And if you were to create more User threads from Main then even after Main thread terminates JVM will run till all User threads terminate.

Thread States:

A thread has 3 Fundamental states:

INIT - Thread has been just created RUNNABLE - Thread is running - (Note.. this doesn’t guarantee this thread is running in the OS processor, it still depends on scheduling more on that latter in this post)

TERMINATED - Thread finished execution. (There is not recovery post termination. A terminated thread is a dead thread)

RUNNABLE thread can get into other sub states:

WAITING -- when thread has called Thread.wait() method or Thread.join() - waiting for completion of another thread

TIMED_WAITING -- when thread has called Thread.wait(timeInMiliseconds) or Thread.join(timeInMiliseconds)

BLOCKED - When thread is blocked on trying to obtain a lock to a synchronized scope.

Basically all the above substates are more or less similar where thread is not doing anything useful but waiting and may or may not resume its work which by going back to RUNNABLE.

Thread Scheduler:

All the concepts of threads and parallel execution are great, but when it comes to hardware we are limited by number of physical cpu cores we have in our system and parallelism is bound to that cpu core. (In case of hyper-threading in Intel’s SMT - “Simultaneous Multithreading” design, for each processor physically present the os would register 2 address for cpu cores. But still number of cores is still a small finite number). Fortunately each CPU core runs a high clock rate to execute instruction, The frequency of instruction cycles executed is measured in GHz. So when you hear dual core 2.4GHz, it means you have 2 cores each capable of running 2.4 * 10^9 instruction cycles in a second. Since the cores are so very fast when they can manage many different process by scheduling them and giving each their turn at using the processor for computation. So in case of dual core you can think of 2 lines of queue being sent to the processor. Now comes the concept of which of the threads are more important from the queued list and should be scheduled earlier. This is defined in priority of a thread.

This also means that when a thread is running other threads are waiting. but the time on waiting is small if as processors are very fast. But if the processors are slow or computation of one thread is very long other threads could wait for a longer time. To avoid this kernels implement time based slicing. where each thread is given a time slice post which next thread is called. if the current thread is not complete in it timeslice it's put on hold and pushed back to the schedule queue and next thread is taken for execution, This prevents thread starvation.

There are two types of Schedulers:

Green Scheduler - JVM based - no Time slicing

Native Scheduler - Kernel Based (Time Slicing)

By default the native scheduler is used to take advantage of kernel time slicing , but this makes multithreaded application platform dependent in terms of scheduling behaviour.

Now when does a scheduler switch from one thread to another. There are 4 cases:

the running thread finishes in its timeslice

the running thread is not finished but time slice is used up.

the running thread does a blocking IO and is waiting for network server to come back with IO response.

A higher priority thread wakes up from IO response and the current thread is preempted (software interrupted) to go back to queue and the higher priority thread takes the processor.

Note: This is why is NonBlocking Java NIO based servers, IO threads are set at higher priority than Task worker threads. Cause we want IO to keep running and accepting connection and not be blocked because worker are doing high intensive calculation and taking more of processor time.

Thread Methods:

sleep, isAlive, join, setPriority, yeild, synchronize, volatile, ThreadLocal, wait, notify, notifyAll, interrupt(), interrupted(), isInterupted()

The above methods help working with Threads:

Thread.sleep(timeInMilli); // used to put a thread to sleep

isAlive() ; // instance Method that returns boolean indicating if Thread is RUNNABLE

A combination of above 2 is used when you want master to create child thread and then wait for for child to complete before it terminates;

Thread childThread = new ChildThread();

childThread.start();

while(childThread.isAlive()){

Thread.sleep(200);

}

join(); - when called on a the child thread waits for that child thread to finish before running through.

So essentially is simplifies the above code:

Thread childThread = new ChildThread();

childThread.start();

childThread.join(); //will wait till childThread is completed

//other variants include join(timeInMilli); wait till completion or time

Another use case of join is indefinite wait

Thread.currentThread().join(); // calling join on itself is an indefinte wait

This is better alternative to

while(true){

Thread.sleep(100);

}

setPriority(int arg); arg takes values 1 to 10 MIN_PRIORITY, MAX_PRIORITY and NORM_PRIORITY.

It needs to be specified before starting the child Thread. This helps to influence scheduler as mentioned above.

yeild() is a simpler way to influence scheduler but has not guarantees.

Calling yeild(); will interrupt the calling thread and push it back into queue and pick the next thread in queue based on priority , it could turn out the same thread is picked. hence no guarantees.

synchronize

This keyword is used to serialize access to a scope in code defined by this block. But care should be taken to understand what is the monitor on which the scope is locked.

synchronize establishes a monitor on the object on which it is called. Think of monitor as a field in the instance/class which only one thread can hold a lock .. and other threads are queued.

synchronize(obj){

//in this scope only one thread holding monitor to obj will run

}

synchronize int method(){

//in this instance method monitor will be established on instance object for which this method is called

}

synchronize static int method(){ // in this monitor is on class object of the class of this method. }

Points to keep in mind when using synchronize:

Synchronization is expensive.. time is taken to lock and release

Avoid having nested synchronized methods. one method calling another and both synchronized. (These lead to deadlocks and race conditions)

synchronize construct is good at doing its job but is not flexible enough. Some of its shortcomings are:

Threads waiting to acquire lock are indefinitely waiting and cannot be interrupted.

Also in case of nested locking lock they mostly run into dead locks and synchornize is scope based and lock is not released post scope and there are no failures.. if second nested lock is acquired by some other thread, we wait indefinitely and not fail fast.

Alternatively Java-1.5 introduced Lock interface and its implementations such as RentrantLock, ReadWriteLock etc.. which provide greater flexibility. More on them in a separate blog post. In this I will continue to focus on the basics.

volatile:

Each process fork in OS has its own copy of its memory, but in JVM a thread which is a lightweight process has isolated memory inside the JVM heap all local variables created inside the new Thread are local to itself. Global variables called by more than one thread are shared. But while executing operations each thread would store the intermediate computation result of these from the processor into the cache register. This leads to corruption of data. To avoid this volatile keyword is used. volatile keyword ensures that when one thread changes the value of this variable its flushed back to RAM - JVM heap. so that other registers know about it. The way this is implemented is by adding additional instruction to processor. Each processor gets a Flush instruction to execute post manipulation to the variable, which enforces the flush. This design is called Memory barriers - Load and Store barrier instructions are used for this purpose.

The visibility problem does not occur when threads use synchronization to access shared field variables. When a thread acquires a lock, the thread's working-memory copies of shared field variables reload from their main-memory counterparts. Similarly, when a thread releases a lock, the working-memory copies flush back to the main-memory shared field variables.

Note: volatile variable does not do synchronization, Consider the below code

volatile String test;

t1 = new Thread(new Runnable(){

public void run(){

test =”Data”;

if(test != null && !test.isEmpty()){

//do something

}

t2 = new Thread(new Runnable(){

public void run(){

test = null;

}

t1.start();

t2.start();

// Here t1.start is scheduled first

// it executes test=”Data” which is flushed and exposed to thread t2 aslo

// now t1 executes test != null which returns true.

// not t2 gets its timeslice of processor

// t2 executes test=null which is flushed and exposed to thread th1

// not t1 get its timeslot

// t1 executes test.isEmpty() -- which throws NullPointer.

Hence from the above example we can see that synchronization and volatile are very different things and should be used appropriately based on what we are trying to do. volatile works well for single shared field with read write operations whereas synchronization is needed when there are either more than one fields or same field with multiple steps of operation as in our example code above where nullCheck is one operation and isEmpty was the other.

ThreadLocal & InhteritableThreadLocal:

ThreadLocal is a very useful abstraction that enables storing values per thread in the application. Primary use cases is in web applications to store request context per thread.

This is also used in J2EE frameworks to simulate session object using cookie based session that is passed on to request context.

Basic usage of ThreadLocal

static ThreadLocal<SomeClass> context = new ThreadLocal<SomeClass>(){

@override

protected synchronized Object initialValue(){

//initial value of this threadLocal

}

Note: initialValue is not mandatory to be overridden but helps in creating a ThreadLocal with default initialValue.

Other methods include:

Object get()

void set(Object value)

ThreadLocal reference are always static as multiple threads accessing a static shared object is the only place ThreadLocal is useful as it avoid synchronization as each Thread has its own store of value. Infact ThreadLocal is not a datastructure that stores, It stores value in the respective Thread.class that invokes it. It provides a pattern of accessing data from Thread.

More on internals of ThreadLocal will be written in a separate Blog.

When building applications where a single request to server spawns multiple threads, it’s important to understand that the request context stored in the ThreadLocal is lost. As new threads created do not have access to it. To avoid this a common practice is clone the object and initialize it in the new Thread starting lines to update ThreadLocal.

A better and elegant alternative is to use InhertiableThreadLocal. This inherits the Threalocal Context values of parent threads to its child threads. Note the work inherits hence child threads can override this value just like in inheritance.

wait() and notify() are methods of Object hence any thread running can call this on any object in java.

calling wait() instance method cause the thread calling it to go to WAITED state.

calling notify() instance method on this same object cause the last thread that called WAIT to wake up and start running.

A use case of this is in Publisher Subscriber model using shared object where you want to synchronize publisher with subscriber accessing a shared stated. I am not going to write that code down as it's very easily found on the internet.

There are other variants of this wait(time) and notifyAll() -- wake up all threads that had called wait.

Note: wait() and notify() should always be called inside synchronized scope else this will lead to ilegalStateMonitorException. A thread must hold the monitor lock on the object on which it intends to invoke wait() or notify().

This is important cause only one thread can call wait() and notify() at a time to have predictable behaviour. If multiple threads are waiting and notifyAll is called by other thread, All threads wake up but are queued up and run through the synchronized scope one by one.

interrupt();

This is an instance method of Thread that can be called on by another thread. Its purpose is interrupt() a waiting thread and throw ThreadInteruptedException if it was waiting.

When interrupt() is invoked refrenceObject of a thread running, two things can happen:

If thread is running without sleeping or waiting or join it will update the running thread boolean state variable interrupted to true, hence isInterrupted() will return true

If thread was in sleep, wait, or join , it will throw ThreadInterruptedException, (Note: here interrupted state variable remains false

One way to check if a thread was interrupted is using isInterrupted(); another way to check is

interrrupted(); but this clears the state and makes it false

Thread Hierarchy and Grouping:

Grouping of threads to provide certain common characteristics among them such as setting max priority that the threadGroup can have etc..

By default Threads in Java have thread Groups hierarchy. The top most Thread Group is system. system has many daemon threads and subThread group main. main has thread main executing public static void main. Now main thread could create sub thread groups or sub threads.

ThreadGroup a1 = new ThreadGroup(“a”); //creates a threadGroup under main threadGroup as main thread is executing this

ThreadGroup b = new ThreadGroup(a1,”b”); // creates b threadGroup under a threadGroup

Thread th = new Thread(a,”thread1”); // thread 1 created under “a” threadGroup

a1.activeGroupCount(); // gets list of threadGroups under a which woudl be 1

Thread.currentThread().getThreadGroup().activeCount(); // will return 2

Thread.currentThread().getThreadGroup().list(); // print to screen.

print notation:

for threadGroup

java.lang.ThreadGroup[name=<threadGroupName>, maxpri=<maxPrioritySetForThisGroup>]

Thread[<threadName>,<priority>,<threadGroupName>] //thread under this group

//similarly subGroup here

//peer groups here

Note: A thread created inside a threadGroup with maxPriority 5 can never be more than 5. Even if we create and set its prirority > 5 its automaticaly reduced to threadGroups max.

To set ThreadGroup’s max priority use tg.setMaxPriority(val) method.

Note: if threads where created in this group prior to this method. They have no effect.

ThreadGroup enable interrupting group of threads which are waiting .. using

tg.interrupt();

ThreadGroups enable simple enumeration of threads:

Create an array of size specified by tg.getActiveCount() // number of threads or

tg.getActiveGroupCount(); // number of threadGroups

update array using enumrate methods

int enumerate(Thread[] thdArray); Get all threads including its subgroup threads.

int enumerate(Thread][ thdArray, boolean recurse); // use recurse false to not go to subgroups

similar methods for ThreadGroup[].

This concludes the basic abstraction of Threads in Java.. Missed out a few such as TimerTask and Time will update them here latter.

#Java #threads

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

User Facing System Deployment and Testing Tips:

When deploying a new version or a complete rewrite of an old system a lot of things can screw up. Here I am documenting my learnings on carrying out deployment of user facing systems in ecommerce space where such systems are expected to be Highly Available with zero down time. In short I would say, Make sure you have a proxy layer.

As you make major changes to your system, it's bound to impact on your contracts with other systems. Consider a case where you may choose to scope down a single system to smaller independent microservices, in such cases your api endpoints also change significantly. Building a proxy which acts as router and also an adaptor to translate request to your contract specification is very useful, this way you can deploy independently and give a deadline to other systems to move to new contract post which you shutdown the proxy.

Another major advantage of using a proxy layer is being able to test and onboard the new system easily. Consider a scenario of rewriting a mission critical system in user's path. In situations where the system is very complex with many use cases, And there is neither sufficient stage setup up to test nor the time to spend on stage testing. Added to this is the fact that the new system is completely new such that its data model is redesigned, its data store has been changed, it's been broken into smaller micro services. In such cases having the outside world talking to the old system over proxy helps running the new system in passive mode and being monitored.

To run in passive mode the trick would be to use proxy layer to fork incoming request into two request - one hitting the main system and the other the new system. The proxy also ensures that output of the first is sent back to client and the output of second is logged along with the main system output. Now the output of both systems can be checked for diff by a cron for any anomalies.

How does this Help. Well imagine running passive mode, it will start to spit out all errors in the system, which you can keep fixing and deploying. Run it for long enough and all your new data will be auto migrated as passive mode had been creating entries in the new systems. The outside world is oblivious to the fact you had 2 system running and when you configured proxy to cut over and pointing only to the second system.

Having said this, there is more to this then using a proxy to fork incoming request. There is also outbound request where things get challenging. As outbound call by primary system and secondary can occur at different points of time. This makes the job of anomaly detection in outbound calls difficult.

Here the primary problem is how to understand which request payloads to compare when they are serialized and one way to solve it is to know what field in the payload to group on, Say both request from old and new system would have primary key like an entity id. But this does not work always and it’s not the right way as now your anomaly detector system is specific to you business logic and coupled to it. It didn't for us, as we were even changing the format of ids from int to string. The better way was to use meta data such as request Headers in case of Http outbound request, This is similar to concept of request tracing in microservice architecture.

If you have heard of zipkin @ twitter you will know what I mean, else don't worry I will try to provide a gist of what it is in my context. It's the idea to monitor and measure metrics across systems in a microservice architecture. Consider one request to web page at facebook hitting 50 services in the backend. Now If you want to analyze a sample of such request and see the duration it took from each sub system, as tree from the user clicking on the homepage to understand bottlenecks.

On a high level this is achieved by using headers (meta data) to pass in a trace Id identifying the request across the system. This can only work when all systems agree to pass this meta data in their outbound calls. In big companies this get standardized by implementing a generic reusable outbound proxy such as finagle in twitter, to some extent service proxy in flipkart. Hence leveraging a similar traceId generated by our proxy which forks incoming calls to the old and new system we can catch the successive outbound calls made by both systems and group then on the meta trace id.

If all these things are great, you would feel that having a proxy is a no brainer. Not really, proxy brings in another hop and all the above is needed when you have a complex system with many interacting services. They are unnecessary effort for small scale or monolithic system. Now about what kind of proxy would work best, we used openresty Nginx with lua, HA proxy is another amazing alternative. The only reason we didn't use HA proxy is because it didn't have plugins to embed lua into it to carry out our fork logic etc. To do it in HAproxy we needed to write C code and we didn't see the need for that as Nginx setup met our requirements of scale.

#distributed systems #micorservices

Clojure Syntax Understanding from code

Reference notes to understand Clojure by translating it to Java from a existing github clojure project (https://github.com/gerritjvv/tcp-driver). Note this is a WIP could have a mistakes especially in types and generics interpretation, but will give you an idea of how go about coding in clojure project.

Note: If you are new to clojure read the earlier post.

##########################################################

Commenting

;; is similar to // comment in java

##########################################################

Namespacing

ns is used to namespace similar to package in java

(ns

^{:doc "TCP connection pools

see: create-tcp-pool"}

tcp-driver.io.pool

(:require

[tcp-driver.io.conn :as tcp-conn])

(:import

(org.apache.commons.pool2 KeyedObjectPool BaseKeyedPooledObjectFactory)

(java.net SocketAddress)

(org.apache.commons.pool2.impl GenericKeyedObjectPool GenericKeyedObjectPoolConfig)))

Above is similar to:

/**

* TCP connection pools

* see @link create-tcp-pool

package tcp-driver.io.pool;

// This line is specific to clojure import

// Imporing another clojure file and namespacing it

(:require

[tcp-driver.io.conn :as tcp-conn])

// here conn.clj is another clojure file imported as tcp-conn object.. which as public methods that can be accessed

import org.apache.commons.pool2.KeyedObjectPool;

import org.apache.commons.pool2.BaseKeyedpooledObjectFactory;

import java.net.SocketAddress;

import org.apache.commons.pools2.impl.GenericKeyedObjectPool;

import org.apache.commons.pools2.impl.GenericKeyedObjectPoolConfig;

##########################################################

Interface

protocol is similar to interface in java

(defprotocol IPool

(-borrow [this key timeout-ms])

(-return [this key obj])

(-invalidate [this key obj])

(-close [this])

(-num-idle [this] [this key])

(-num-active [this] [this key]))

above is similar to

public interface IPool {

void borrow(key, timeout-ms);

void return(key,obj);

void invalidate(key,obj);

close();

IPool num-idle();

IPool num-active();

}

##########################################################

Class

eg of simple class

(defrecord HostAddress [^String host ^int port])

this in java is:

class HostAddress {

String host;

Integer port;

public HostAddress(String host, int port){

this.host=host;

this.port=port;

}

eg of class implementing interface

(defrecord KeyedTCPConnFactory [^GenericKeyedObjectPool pool]

IPool

(-borrow [_ key timeout-ms] (.borrowObject pool key (long timeout-ms)))

(-return [_ key obj] (.returnObject pool key obj))

(-invalidate [_ key obj] (.invalidateObject pool key obj))

(-close [_] (.close pool))

(-num-active [_] (.getNumActive pool))

(-num-active [_ key] (.getNumActive pool key))

(-num-idle [_] (.getNumIdle pool))

(-num-idle [_ key] (.getNumIdle pool key)))

this in java is:

//Note key does not need to be string.. making that assumption as of now

class KeyedTCPConnFactory implements IPool {

private GenericKeyedObjectPool pool;

public KeyedTCPConnFactor(GenericKeyedObjectPool pool){

this.pool = pool;

}

public Tuple borrow(String key, int timeout-ms) {

return new Tuple(pool.borrowObject(key),timeout-ms);

}

public void return (String key, Object obj){

pool.returnObject(key,obj);

}

public void invalidate(String key, Object obj){

pool.invalidateObject(key,obj);

}

public void close(){

pool.close();

}

public int num-active(){

return pool.getNumActive();

}

public int num-active(String key){

return pool.getNumActive(key);

}

public int num-idle(){

return pool.getIdle();

}

public int num-idle(String key){

return pool.Idle(key);

}

##########################################################

Basic function declared in clj

(defn borrow

Params:

pool an instance of IPool

key an instance of tcp-driver.io.conn.HostAddress

timeout-ms long timeout in milliseconds

Exceptions: NoSuchElementException, Exception"

[pool key timeout-ms]

(-borrow pool key timeout-ms))

here borrow - function name

" text here is java doc"

[pool key timeout-ms] are args

(-borrow pool key timeout-ms) // body of function i.e. execution

this execution says call -borrow (note - prefix indicates private or member method of pool (the first arg after this)) on args key and timeout-ms

i.e. pool.-borrow(key,timeout-ms); // which is defined

this function borrow is global declaration and can be used by other clj files importing this file as mentioned in

(:require

[tcp-driver.io.pool :as tcp-pool])

(tcp-pool/boorow pool key timeout-ms)

Note to declare functions in a type safe way to specify output types

(defn

^InputStream

input-stream [conn]

(-input-stream conn))

here first arg with ^ symbol prefix specifies output type

followed by method name

then optional "<doc here>" doc which is absent in above example

followed by args

then a body which says call conn.-input-stream();

##########################################################

To Summarize *.clj file start with name-spacing and import followed by interface if any, class if any then global variables if any and public methods

eg. conn.clj

;;;;;;;;;;;;;;;;;;;;;;;;;

;;;;;;;;;;;;names space declaration and imports

(ns

^{:doc "TCP Connection abstractions and implementations

see host-address and tcp-conn-factory"}

tcp-driver.io.conn

(:import

(java.net InetAddress Socket SocketAddress InetSocketAddress)

(org.apache.commons.pool2 BaseKeyedPooledObjectFactory PooledObject KeyedPooledObjectFactory)

(org.apache.commons.pool2.impl DefaultPooledObject)

(java.io InputStream OutputStream)))

;;;;;;;;;;;;;;;;;;;;;;;;;

;;;;;;;;;;;;Class HostAddress and ITCPConn interface

(defrecord HostAddress [^String host ^int port])

(defprotocol ITCPConn

(-input-stream [this])

(-output-stream [this])

(-close [this])

(-valid? [this]))

;;;;;;;;;;;;;;;;;;;;;;;;;

;;;;;;;;;;;;Class SocketConn implementation of ITCPConn

(defrecord SocketConn [^Socket socket]

ITCPConn

(-input-stream [_] (.getInputStream socket))

(-output-stream [_] (.getOutputStream socket))

(-close [_] (.close socket))

(-valid? [_] (and

(.isConnected socket)

(not (.isClosed socket)))))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;;;;;;;;;;;;;;;; Public API exposed from this clj file

(defn wrap-tcp-conn

"Wrap the Socket in a ITCPConn"

[^Socket socket]

(->SocketConn socket))

;;; here body (->SocketConn socket) is as good as return new SocketConn(socket); instantiation and return

(defn create-tcp-conn [{:keys [host port]}]

{:pre [(string? host) (number? port)]} ;;TODO figure out how this works, appears to be validation checks

(->SocketConn ;; return new SocketConn

(doto (Socket.) ;; instantiate new Socket and

(.connect (InetSocketAddress. (str host) (int port))) ;; call connect on new InetSocketAddress with args and

(.setKeepAlive true)))) ;; call setKeepAlive on socket with args

#clojure

Clojure Learning Notes -WIP has many typos

This post contains adhoc notes I took while listening to talk introducing clojure. It expects you to know programming concepts well preferably Java.

Fundamentals

Dynamic - dynamically typed

a new Lisp, not Common Lisp or scheme

Trending Blogs

Last Seen Blogs

Java, Javascript, Python, Linux, Node.js....