Top Posts Tagged with #msgpack

久しぶりにみたら composer のライブラリになってた。昔は php extension をインストールしてねみたいな感じだったと思う。

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

Memory use is often overlooked when people compare JSON libraries

TL;DR: In decode oriented use-case with big payloads JSON decoders often use disproportionate amounts of memory. I gave up on JSON and switched to Msgpack.

You should draw your own conclusions by running the test code yourself.

Based on various feedback [*] I've did the benchmarks again, using ru_maxrss instead of Valgrind and with few more implementations.

processing one large file - 240MB JSON

Peak memory usage (Python 3.5):

marshal: 372.1 Mb pickle: 372.9 Mb msgpack: 376.6 Mb rapidjson: 668.6 Mb yajl: 687.3 Mb ujson: 1,578.9 Mb json: 3,422.3 Mb simplejson: 6,681.4 Mb Speed (Python 3.5) ----------------------------------------------- Name (time in ms) Min ----------------------------------------------- test_speed[msgpack] 69.0613 (1.0) test_speed[pickle] 69.9465 (1.01) test_speed[marshal] 74.9914 (1.09) test_speed[rapidjson] 337.5243 (4.89) test_speed[ujson] 902.8647 (13.07) test_speed[yajl] 1,195.4298 (17.31) test_speed[json] 4,404.9523 (63.78) test_speed[simplejson] 6,524.9919 (94.48) -----------------------------------------------

Bottom line

Both speed and memory are affected by data shape. Speed is not always proportional to memory use.

Again, don't trust the numbers, run the benchmarks yourself, with your own data. Even if your data is identical in shape your hardware might behave differently than mine. Even the memory use can be different on your machine (example: different architecture, different shared libraries). And what's the chance your data has the exact shape as whatever was used in the benchmark?

...

Feedback from commenters here, Reddit, HackerNews and Google+.

#json #json parsers #python #valgrind #msgpack #benchmarks #massif #cjson #rapidjson

#msgpack で Ruby オブジェクトを deep copy.

ネストした Array や Hash を deep copy する方法としては，`Marshal` を使って， Marshal.load(Marshal.dump(obj)) とする方法が[有名](http://stackoverflow.com/questions/8206523/how-to-create-a-deep-copy-of-an-object-in-ruby)です．しかし，deep copy したいオブジェクトが JSON をデコードしたものなど， Message Pack で表現できるもの (*1) である場合は `to_msgpack`/`MessagePack.unpack` したほうが速い場合があります．ここでは，次のベンチマークプログラムで確認してみます． require 'rubygems' require 'open-uri' require 'benchmark' require 'active_support' require 'msgpack' json = open('https://raw.github.com/msgpack/msgpack-ruby/master/spec/cases.json').read obj = ActiveSupport::JSON.decode(json) n = 1000000 Benchmark.bmbm do |x| x.report('Marshal:') { for i in 1..n; Marshal.load(Marshal.dump(obj)); end } x.report('Msgpack:') { for i in 1..n; MessagePack.unpack(obj.to_msgpack); end } x.report('JSON:') { for i in 1..n; ActiveSupport::JSON.decode(json); end } end これを実行すると， $ ruby deep_copy_benchmark.rb Rehearsal -------------------------------------------- Marshal: 40.610000 0.410000 41.020000 ( 41.021917) Msgpack: 11.860000 0.280000 12.140000 ( 12.133655) JSON: 20.040000 0.960000 21.000000 ( 21.005835) ---------------------------------- total: 74.160000sec user system total real Marshal: 40.440000 0.290000 40.730000 ( 40.722019) Msgpack: 11.820000 0.250000 12.070000 ( 12.067954) JSON: 20.420000 0.780000 21.200000 ( 21.204583) `Marshal` を使った場合より，`MessagePack` を使ったほうが速いことが分かります．さらに，pack/unpack と二重の手間をかけても，元の JSON を再デコードするより速いです．とはいえ，`Marshal` も十分速いので通常の利用にはこれで十分なのですが， deep copy しまくっていてこれを削りたいとか，あと 1ms 削りたいとかというような要求が出たときの奥の手として用意しておきましょう． ---- __*1__: たとえば UTF-8 文字列が入っている場合はこの手が使えないので注意→ https://github.com/msgpack/msgpack/issues/121

#msgpack #ruby

Msgpack vs JSON in the jaws of compression

In my previous post I've shown how can you squeeze more out of your datastores by using compression. Similar rules apply to bandwidth and data transmission over the wire. There are various serialization formats you can choose from Protobuf, Thrift, Avro just to name few. I always prefer schema free serialization formats in case of key/value stores, this gives me error free data loading even when your data changes over time. The most favorite format for structure free schema is obviously JSON. However the new kind on the block is MsgPack. I've been playing around for a while with MsgPack in few projects of mine and it is actually good, compact, well documented and supports lots of platforms. But how does it compare to JSON (not in terms of performance or speed)? We all have a huge trust on JSON, its already a superhero who slayed XML despite its weaknesses. In this little adventure I set out to compare JSON vs MsgPack in terms of bytes when compressed! Lets get straight down to the business, here is the source code I used:

I am simply loading about 200 random tweets, then encoding those tweets to JSON, MsgPack, with Gzip and LZ4 compression. Results are pretty disturbing in case of GZip:

Now LZ4 looks quite normal and just as we expected with but GZIP just in 200 tweets MsgPack takes 189057 bytes and JSON takes only 177976 bytes. Bingo! now this is what I call a smart combination. You get 2 standard components that's not only available for native applications you can write; but they are also available in modern browsers you are using today! You can use them in Javascript too with no special decoder to load data and simply use it. Now some of you may be wondering what's the big deal? Here is the deal, if you can detect your browser supports GZIP Content-Encoding over XHR, you can serve gzipped JSON directly out of the data store to your clients (i.e. no fetch encode to JSON and stream Gzipped). You can use similar technique for cache systems like NGINX + Memcache [using HttpMemcachedModule] to serve some of your REST calls really quick (user profile, user info, etc).

#GZIP #MsgPack #JSON #HTTP #Javascript #Compression #NGINX

MessagePack: A New way of messaging

MessagePack[http://msgpack.org/]

MessagePack is an efficient object serialization library, which are very compact and fast data format, with rich data structures compatible with JSON.

MessagePack-RPC is the remote procedure call system on top of the MessagePack serialization library.

The Features

Handling Primitive Types - integer, nil, boolean, float, double, raw

Handling Some Containers - array, map Streaming

Deserialization Language Bindings C, C++, Ruby, Python, Perl, PHP, Java, Haskell, D, Erlang, Lua

Non-Features

User-defined record - simulate by array or map

Serialization with Cyclic Reference

#messagepack #msgpack #Serialization #JSON #RPC

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

kt-msgpackを改造してKyotoTycoonを操作できるProcedureを便利にしてみた

KyotoTycoon では.so、.dylibといった共有ライブラリによるプラグインをサポートとして、プラグインとして、memcachedプロトコルを実装したプラグインや、msgpack-rpcで使えるようにした kt-msgpack がある。 kt-msgpack では、msgpack-rpc を利用しているので、同期的なProcedureの呼び出しはもちろん、非同期なProcedureの呼び出しや、レスポンスを返さない通知的なProcedureの呼び出しでKyotoTycoonのデータをいじれるようになっている。ですがですが、ソースを見てみると、KyotoTycoonで提供しているProcedureがあまりにも少ないことが分かったので、git にあるプロジェクトをforkしてちょっと充実させてみた。

充実させたProcedureたち

今回の改造で、充実&追加したProcedureは以下な感じ。

ping

echo

status

add

set

get

remove

append

seize

clear

replace

cas

increment

increment_double

match_prefix

match_regex

set_bulk

remove_bulk

get_bulk

vacuum

synchronize

データ操作系や検索系、そしてバルク操作系があるので、これぐらいあれば何とかことは足りるかと。 pingという変な名前のProcedureがありますが、これはvoidに相当し、msgpack-rpcの制約(というかIDLからC++のコードを生成するツールの制約?)でProcedureが別名にしています。詳細はここのProceduresの一覧をご覧ください。

サポートしなかったProcedureたち

今回諸事情サポートしなかったProcedureは以下な感じ。

report

play_script

tune_replication

cur_jump

cur_jump_back

cur_step

cur_step_back

cur_set_value

cur_remove

cur_get_key

cur_get_value

cur_get

cur_seize

cur_delete

report、play_script、tune_replication はちょっと作りこみ(場合によってはKyotoTycoonをいじる必要があるかも)が必要なので今回は対応していない。

cur_xxx系は、KyotoTycoonのアーキテクチャとmsgpack-rpc(mpio)のアーキテクチャが異なっているので、実装するのはちょっとしんどい。KyotoTycoonではセッション毎にローカルストレージにcursorオブジェクトを保持することで対応できているのだが、msgpack-rpc(mpio)では、KyotoTycoonのような感覚でセッションを使える仕組みがないので、同じ機能の実装はmsgpack-rpcを改造でもしない限りちょっと辛いところだ。(というより、自分がmsgpack-rpcに同じような機能があるのを知らないだけ?)

そもそも、cursorの操作については、KyotoTycoonの作者のブログに書いてある意見と同じく、ネットワーク越しに長期間リソースを占有するようなものは、サーバのリソースの無駄遣いだと思う。そのようなことをしたい場合はplay_scriptのようなストアドプロシジャ的なスクリプトでcursor操作をして、さくっと用を足してしまえばいいと個人的に思っている。なので、今後は、cur_xxx系のProcedureは対応しない方向で考えている。

パフォーマンス測定

せっかく、Procedureを充実させたんで、一番気になるバルク系のデータ操作のパフォーマンスを測ってみた。計測は5回行い、平均値を計る。サーバ、クライアントは同じマシン上で測定した。

測定プログラム

KyotoTycoon をインストールした際にバンドルされてくるktremotetestとそれをkt-msgpackを向けに改変した、ktmpremotetestによって行う。また、比較のためktremotetestで、KyotoTycoonのHTTPとBinaryも計測する。ちなみに、ktmpremotetestは今回改造したこちらからgit cloneしてインストールすると、kt-msgpackといっしょにインストールされる。

マシン環境

マシン環境は以下のとおり。

Macbook (13-inch, Early 2009)

OS: Mac OS X Lion 10.7.3

CPU: 2 GHz Intel Core 2 Duo

Memory: 4 GB 667 MHz DDR2 SDRAM

HDD: 256GB (FUJITSU MHZ2250BH FFS G1 Media SATA)

KyotoTycoonの起動オプション

KyotoTycoonの起動オプションは以下で起動する。

$ ktserver -th 8 -lz -plsv /opt/local/libexec/libktmsgpack.dylib -plex "port=18801#thread=8" "casket.kct#bnum=2000000#opts=ls#ktopts=p"

バケット数は100万レコードの2倍値(bnum=2000000)

データベースオプションは、4バイトアドレッシング、線形リスト(opts=ls)

パフォーマンスをちゃんと計測するために、KyotoTycoon側のスレッド数とkt-msgpackプラグインのスレッド数を同じ値に設定(-th 8, thread=8)

ログ出力はなし(-lz)

データは永続化(ktopts=p)

測定内容

KyotoTycoonの作者のこの記事と同じことをする。サーバに対して、「00000001」「00000002」といった文字列のキーと値を持つレコード合計100万件の読み書きを行う。以下のコマンド、set系9パターン、get系9パターン、計18パターンを計測する。

ktremotetest bulk -th 4 -set -bulk 1 250000

ktremotetest bulk -th 4 -set -bulk 10 250000

ktremotetest bulk -th 4 -set -bulk 100 250000

ktremotetest bulk -th 4 -set -bin -bulk 1 250000

ktremotetest bulk -th 4 -set -bin -bulk 10 250000

ktremotetest bulk -th 4 -set -bin -bulk 100 250000

ktmpremotetest bulk -th 4 -set -bulk 1 250000

ktmpremotetest bulk -th 4 -set -bulk 10 250000

ktmpremotetest bulk -th 4 -set -bulk 100 250000

ktremotetest bulk -th 4 -get -bulk 1 250000

ktremotetest bulk -th 4 -get -bulk 10 250000

ktremotetest bulk -th 4 -get -bulk 100 250000

ktremotetest bulk -th 4 -get -bin -bulk 1 250000

ktremotetest bulk -th 4 -get -bin -bulk 10 250000

ktremotetest bulk -th 4 -get -bin -bulk 100 250000

ktmpremotetest bulk -th 4 -get -bulk 1 250000

ktmpremotetest bulk -th 4 -get -bulk 10 250000

ktmpremotetest bulk -th 4 -get -bulk 100 250000

-thは、起動スレッド数を意味する。 -setはset_bulkによるレコード設定を意味する。 -getはget_bulkによるレコード設定を意味する。 -bulkは、バルク操作で一度に設定/取得するレコード数を意味する。 ktremotetestの-binはBinaryプロトコルによるバルク設定/取得を意味する。このオプションがない場合は、HTTPプロトコルによるバルク設定/取得になる。

測定結果

100万件のバルク操作の計測結果は以下のようになった。

計測1 計測2 計測3 計測4 計測5 average 1.HTTP set 1 bulk 124.643 126.641 119.626 125.788 140.967 127.533 2.HTTP set 10 bulk 20.731 17.873 15.776 16.058 15.281 17.1438 3.HTTP set 100 bulk 7.052 6.719 8.246 6.54 6.778 7.067 4.Binary set 1 bulk 70.699 71.705 66.925 70.883 70.841 70.2106 5.Binary set 10 bulk 8.44 9.193 8.939 9.877 8.776 9.045 6.Binary set 100 bulk 4.449 4.923 4.341 4.412 4.309 4.4868 7.msgpack-rpc set 1 bulk 78.078 85.017 79.846 77.657 87.636 81.6468 8.msgpack-rpc set 10 bulk 16.943 14.585 13.157 13.645 17.013 15.0686 9.msgpack-rpc set 100 bulk 7.261 7.19 6.894 7.659 10.003 7.8014 10.HTTP get 1 bulk 117.963 124.22 122.057 120.705 132.164 123.4218 11.HTTP get 10 bulk 16.756 19.225 18.153 18.522 21.171 18.7654 12.HTTP get 100 bulk 8.068 8.733 11.251 9.193 9.028 9.2546 13.Binary get 1 bulk 61.831 72.635 74.087 70.9 75.88 71.0666 14.Binary get 10 bulk 9.883 10.741 11.739 10.249 13.278 11.178 15.Binary get 100 bulk 5.861 7.47 6.11 6.061 6.048 6.31 16.msgpack-rpc 1 bulk 79.438 89.539 74.367 92.129 91.659 85.4264 17.msgpack-rpc 10 bulk 18.262 19.146 16.453 20.083 19.446 18.678 18.msgpack-rpc 100 bulk 11.175 10.415 10.667 10.95 14.237 11.4888

やっぱり、Binaryがパフォーマンスいいですね。msgpack-rpcも結構いい感じパフォーマンスでています。この計測結果からみると、パフォーマンスとしては、

Binary > msgpack-rpc > HTTP

といった感じでしょうか。まあ、bulk操作で扱うデータ数を増やせば、まあそれなりの速度で処理することができるので、どれを利用しても実運用上特に問題ないかあと。同期、非同期の通信を柔軟に選択できつつ、パフォーマンスが確保できるっていうのがメリットなのかもしれない。(まあ、Node.jsを利用すれば、それできますけど。。。)

まとめ

kt-msgpackをforkして、KyotoTycoonで提供するProcedureを充実させました。この改造により他の言語でもmsgpack-rpcを利用することで、set、getによるデータ操作、バルク一括操作、検索、そしてcas、incrementが利用できるようになりました。msgpack-rpcのバルク操作のパフォーマンスも、HTTPとBinaryの間なので、通信機能で柔軟にロジックを制御したいっていう場合には、KyotoTycoonを使うユースケースでは、kt-msgpackを選択するのはいいかもしれません。

#kyototycoon #msgpack #performance #msgpack-rpc

おお、論文だ。たまたま今触っているので、助かる。

#msgpack

久しぶりにみたら composer のライブラリになってた。昔は php extension をインストールしてねみたいな感じだったと思う。

#php #msgpack

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

Memory use is often overlooked when people compare JSON libraries

TL;DR: In decode oriented use-case with big payloads JSON decoders often use disproportionate amounts of memory. I gave up on JSON and switched to Msgpack.

You should draw your own conclusions by running the test code yourself.

Based on various feedback [*] I've did the benchmarks again, using ru_maxrss instead of Valgrind and with few more implementations.

processing one large file - 240MB JSON

Peak memory usage (Python 3.5):

Bottom line

Both speed and memory are affected by data shape. Speed is not always proportional to memory use.

...

Feedback from commenters here, Reddit, HackerNews and Google+.

#json #json parsers #python #valgrind #msgpack #benchmarks #massif #cjson #rapidjson

#msgpack で Ruby オブジェクトを deep copy.

#msgpack #ruby

Msgpack vs JSON in the jaws of compression

I am simply loading about 200 random tweets, then encoding those tweets to JSON, MsgPack, with Gzip and LZ4 compression. Results are pretty disturbing in case of GZip:

#GZIP #MsgPack #JSON #HTTP #Javascript #Compression #NGINX

MessagePack: A New way of messaging

MessagePack[http://msgpack.org/]

MessagePack is an efficient object serialization library, which are very compact and fast data format, with rich data structures compatible with JSON.

MessagePack-RPC is the remote procedure call system on top of the MessagePack serialization library.

The Features

Handling Primitive Types - integer, nil, boolean, float, double, raw

Handling Some Containers - array, map Streaming

Deserialization Language Bindings C, C++, Ruby, Python, Perl, PHP, Java, Haskell, D, Erlang, Lua

Non-Features

User-defined record - simulate by array or map

Serialization with Cyclic Reference

#messagepack #msgpack #Serialization #JSON #RPC

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

kt-msgpackを改造してKyotoTycoonを操作できるProcedureを便利にしてみた

充実させたProcedureたち

今回の改造で、充実&追加したProcedureは以下な感じ。

ping

echo

status

add

set

get

remove

append

seize

clear

replace

cas

increment

increment_double

match_prefix

match_regex

set_bulk

remove_bulk

get_bulk

vacuum

synchronize

サポートしなかったProcedureたち

今回諸事情サポートしなかったProcedureは以下な感じ。

report

play_script

tune_replication

cur_jump

cur_jump_back

cur_step

cur_step_back

cur_set_value

cur_remove

cur_get_key

cur_get_value

cur_get

cur_seize

cur_delete

report、play_script、tune_replication はちょっと作りこみ(場合によってはKyotoTycoonをいじる必要があるかも)が必要なので今回は対応していない。

パフォーマンス測定

測定プログラム

マシン環境

マシン環境は以下のとおり。

Macbook (13-inch, Early 2009)

OS: Mac OS X Lion 10.7.3

CPU: 2 GHz Intel Core 2 Duo

Memory: 4 GB 667 MHz DDR2 SDRAM

HDD: 256GB (FUJITSU MHZ2250BH FFS G1 Media SATA)

KyotoTycoonの起動オプション

KyotoTycoonの起動オプションは以下で起動する。

$ ktserver -th 8 -lz -plsv /opt/local/libexec/libktmsgpack.dylib -plex "port=18801#thread=8" "casket.kct#bnum=2000000#opts=ls#ktopts=p"

バケット数は100万レコードの2倍値(bnum=2000000)

データベースオプションは、4バイトアドレッシング、線形リスト(opts=ls)

パフォーマンスをちゃんと計測するために、KyotoTycoon側のスレッド数とkt-msgpackプラグインのスレッド数を同じ値に設定(-th 8, thread=8)

ログ出力はなし(-lz)

データは永続化(ktopts=p)

測定内容

ktremotetest bulk -th 4 -set -bulk 1 250000

ktremotetest bulk -th 4 -set -bulk 10 250000

ktremotetest bulk -th 4 -set -bulk 100 250000

ktremotetest bulk -th 4 -set -bin -bulk 1 250000

ktremotetest bulk -th 4 -set -bin -bulk 10 250000

ktremotetest bulk -th 4 -set -bin -bulk 100 250000

ktmpremotetest bulk -th 4 -set -bulk 1 250000

ktmpremotetest bulk -th 4 -set -bulk 10 250000

ktmpremotetest bulk -th 4 -set -bulk 100 250000

ktremotetest bulk -th 4 -get -bulk 1 250000

ktremotetest bulk -th 4 -get -bulk 10 250000

ktremotetest bulk -th 4 -get -bulk 100 250000

ktremotetest bulk -th 4 -get -bin -bulk 1 250000

ktremotetest bulk -th 4 -get -bin -bulk 10 250000

ktremotetest bulk -th 4 -get -bin -bulk 100 250000

ktmpremotetest bulk -th 4 -get -bulk 1 250000

ktmpremotetest bulk -th 4 -get -bulk 10 250000

ktmpremotetest bulk -th 4 -get -bulk 100 250000

測定結果

100万件のバルク操作の計測結果は以下のようになった。

Binary > msgpack-rpc > HTTP

まとめ

#kyototycoon #msgpack #performance #msgpack-rpc

おお、論文だ。たまたま今触っているので、助かる。

#msgpack

Top Posts Tagged with #msgpack | Tumlook

Trending Tags

Last Seen Tags

#msgpack

Trending Tags

Last Seen Tags

#msgpack