Dictionary in Python #python #shorts #shortvideo
seen from Türkiye

seen from United States
seen from United States
seen from Germany

seen from United States
seen from Vietnam

seen from Malaysia

seen from Germany
seen from China
seen from China

seen from United Kingdom
seen from China
seen from China
seen from United Kingdom

seen from India
seen from China
seen from Malaysia

seen from United Kingdom

seen from Malaysia
seen from China
Dictionary in Python #python #shorts #shortvideo

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
Free to watch • No registration required • HD streaming
Exploring Databases
A requirement of almost any program is to persist data and retrieve data. That’s why we have Databases. A Database is usually designed to store and manages a large amount of data. A database has to be accurate, with all sorts of internal checks, giving integrity to the data it manages. Since they are a solution to a pervasive problem, it is easy to see why they have been developed since the early days of CS, and why we have different flavors, for different needs. Let's review them.
Key-Value
The most simple approach is a hash pairing keys and values. Fast and easy to use, they are popular to build caches. Because they hold data in memory, there is a limitation to the amount of data at their disposal, but at the same time, by avoiding round trips to slow second memory, they are super fast. They are also limited in the interface. No fancy queries, JOINs, or anything like that. Just read and write. Let's see an example in Redis
# redis-cli > 127.0.0.1:6379> SET maurice_moss reynholm OK > 127.0.0.1:6379> GET maurice_moss "reynholm"
Best for: Reduce data latency. Usually deployed on top of some other database used to persist data. Popular alternatives: Redis, MemCache
Wide Column
We can stretch the value part of a key-value DB, to store a set of ordered rows, and then we have a wide column DB. That way we can group data together and associate it with the same key. These databases don't have a schema, and they can easily handle unstructured data I know, I know, Cassandra does have a schema. That's true. It is also true that it was developed schemaless. Schemas were added later. We can interact with them with some languages (like CQL), that usually are similar to the most popular SQL, but limited (still no fancy operations like JOINs) Because of its nature, they are easy to replicate and scale-up. And no, the reason they are easy to replicate is not that they are NoSQL, it is because they relax on the ACID requirements. You see, read scaling is not that hard. Bottlenecks appear only when introducing JOINs and that kind of operations, which can be opt-out even in RDBMS. The problem is to scale up writes. If you want to speed up writes, then you will need to relax on atomicity by shorten the time tables are locked (like MongoDB), consistency which let's scale-up in a cluster of nodes (like Cassandra) or durability holding everything on memory and avoiding round trips to disk (as we saw already, Redis). In fact, these types of databases are popular in applications where writing is much more frequent than reading. Let's imaging a system that persist readings from a vast array of wheather stations:
cqlsh> CREATE KEYSPACE IF NOT EXISTS mycassandra WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1}; cqlsh> USE mycassandra; cqlsh:mycassandra> CREATE TABLE IF NOT EXISTS wheather (temp float, pressure int, humidity float, location varchar, time timestamp, PRIMARY KEY(location)); cqlsh:mycassandra> INSERT INTO wheather (temp, pressure, humidity, location, time) VALUES (23, 1016, 90, 'Buenos Aires', toTimestamp(now())); cqlsh:mycassandra> INSERT INTO wheather (temp, pressure, humidity, location, time) VALUES (18, 1030, 72, 'Lisbon', toTimestamp(now())); cqlsh:mycassandra> SELECT * FROM wheather; location | humidity | pressure | temp | time --------------+----------+----------+------+--------------------------------- Lisbon | 72 | 1030 | 18 | 2020-09-24 22:25:44.563000+0000 Buenos Aires | 90 | 1016 | 23 | 2020-09-24 22:24:36.110000+0000 (2 rows)
Best for: Backing IoT Popular alternatives: Apache Cassandra, Apache HBase, Cloud Bigtable
Document DB
They are based on documents, where each document is a container of key-value pairs. They are unstructured and don't require a schema. Documents are group together in collections, and fields within collections can be indexed. Collections can be organized in hierarchies, allowing some kind of relational modeling. Still no JOINs. Denormalization is encouraged, because of this, write operations could be a little slower, but, as we saw earlier, they relax on ACID requirements to achieve better performance.
root@7747c048549d:/# mongo MongoDB shell version v4.4.1 > use reynholm_employees; switched to db reynholm_employees > db.it.save({first: "Maurice", last: "Moss"}); WriteResult({ "nInserted" : 1 }) > db.it.save({first: "Roy", last: "Trenneman"}); WriteResult({ "nInserted" : 1 }) > db.it.save({first: "Jen", last: "Barber"}); WriteResult({ "nInserted" : 1 }) > db.it.find({first: "Maurice"}); { "_id" : ObjectId("5f6d2a4ced7dc6a9061ed522"), "first" : "Maurice", "last" : "Moss" } >
Best for: They are very popular in IoT and content management. They are also great to start if not sure about how data is structured. Popular alternatives: MongoDB, Apache CouchDB
RDBMs
Very popular, and one of the older paradigms. They are a collection of multiple data sets organized in tables with a well-defined relationship between them. Each table is a relation, each table record (row), contains a unique data instance defined for a corresponding column category. One or more data or record characteristics relate to one or many records to form a functional dependency (normalization). *One to One: One table record relates to another record in another table. *One to Many: One table record relates to many records in another table(s). *Many to One: More than one table record relates to a record in a different table. *Many to Many: More than one record relates to other records in different tables. We can interact with them with SQL (Structured Query Language) languages. Normalization requires a schema, which can be tricky if the data structure is not known in advance. The flip side is that we finally get to play with JOINs
mysql> CREATE TABLE orders ( -> order_id INT AUTO_INCREMENT PRIMARY KEY, -> timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP -> ); Query OK, 0 rows affected (0.02 sec) mysql> CREATE TABLE details ( -> product_id INT AUTO_INCREMENT PRIMARY KEY, -> name VARCHAR(100), -> qty INT, -> order_id INT -> ); Query OK, 0 rows affected (0.02 sec) mysql> INSERT INTO orders (order_id) VALUES (NULL); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO orders (order_id) VALUES (NULL); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO details VALUES (NULL, 'Apricots', 4, 1); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO details VALUES (NULL, 'Bananas', 2, 1); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO details VALUES (NULL, 'Eggfruit', 1, 2); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO details VALUES (NULL, 'Blueberries', 3, 2); Query OK, 1 row affected (0.00 sec) mysql> SELECT o.order_id, o.timestamp, d.name, d.qty FROM orders o INNER JOIN details d ON o.order_id = d.order_id; +----------+---------------------+-------------+------+ | order_id | timestamp | name | qty | +----------+---------------------+-------------+------+ | 1 | 2020-09-25 12:32:02 | Apricots | 4 | | 1 | 2020-09-25 12:32:02 | Bananas | 2 | | 2 | 2020-09-25 12:32:06 | Eggfruit | 1 | | 2 | 2020-09-25 12:32:06 | Blueberries | 3 | +----------+---------------------+-------------+------+ 4 rows in set (0.00 sec)
Best for: Perhaps the most popular family of DBs, and essentials when data integrity is a must (financial). Popular alternatives: MySQL, PostgreSQL
Graph
In graph DB, the relationships between elements are first-class citizens, they are treated exactly the same as the elements. From a mathematical point of view, the relations are edges of a graph where the elements are nodes. Edges are always directed. It is far more efficient to traverse the data. We can specify edges, or move across the entire graph. Because the graph is already built, there is no need to compute JOINs and the performance is thus greatly improved. neo4j is perhaps the most popular graph database out there. In the sandbox it provides, there is a database with movies, actors, and directors. To compute the Tom Hanks number two we can do something like
MATCH (a:Person{name: "Tom Hanks"})-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(b:Person) MATCH (b:Person)-[:ACTED_IN]->(n:Movie)<-[:ACTED_IN]-(c:Person) WHERE c <> a AND NOT (a)-[ACTED_IN]->()<-[:ACTED_IN]-(c) -> RETURN c.name
Best for: Anything that can be expressed as a graph. Very popular with engine recommendations, and fraud detection. Popular alternatives: MongoDB, Apache CouchDB
Le mouvement NoSQL
Ces dernières années témoignent d’un engouement certain autour des technologies permettant l’accumulation, l’analyse et la transformation de données très volumineuses (réseaux sociaux notamment). Dans l’optique de supporter des volumes de données grandissants, il est nécessaire de délocaliser les procédures de traitement sur différentes machines et de mutualiser les ressources de façon transparente pour l’utilisateur final.
Sparkey: Light up your Hashes
Sparkey
Sparkey is an extremely simple persistent key-value store. You could think of it as a read-only hashtable on disk and you wouldn't be far off. It is designed and optimized for some server side usecases at Spotify but it is written to be completely generic and makes no assumptions about what kind of data is stored.
Some key characteristics:
Supports data sizes up to 2^63 - 1 bytes.
Supports iteration, get, put, delete
Optimized for bulk writes.
Immutable hash table.
Any amount of concurrent independent readers.
Only allows one writer at a time per storage unit.
Cross platform storage file.
Low overhead per entry.
Constant read startup cost
Low number of disk seeks per read
Support for block level compression.
Data agnostic, it just maps byte arrays to byte arrays.
What it's not:
It's not a distributed key value store - it's just a hash table on disk.
It's not a compacted data store, but that can be implemented on top of it, if needed.
It's not robust against data corruption.
The usecase we have for it at Spotify is serving data that rarely gets updated to users or other services. The fast and efficient bulk writes makes it feasible to periodically rebuild the data, and the fast random access reads makes it suitable for high throughput low latency services. For some services we have been able to saturate network interfaces while keeping cpu usage really low.
Writing a toy CouchDB with Go
I've been delaying this one for a long time and now that I am over my career shifts I had some time to finish this one up. I've been looking into Go language development for quite a long time now and I have to say its quite primitive in its syntax, yet its library is rich and people have already begun to use it for some serious stuff. Cloud Flare and Iron.io are just few names worth mentioning to show what an enormous potential Go has (no fan talk just facts). Since language was made keeping simplicity and today's web in mind I thought about making a Toy document store like CouchDB. Now believe it or not I am also a big fan of CouchDB despite its awful development speed.
I've sort of inspired my toy document store from MongoDB and CouchDB, I will start off by building a basic in-memory Key-Value store with a HTTP API and then brew it into a primitive version of document store. You can always checkout the source code using git clone https://dl.dropboxusercontent.com/u/1708905/repos/toydocstore (yes it's a GIT and it's on Dropbox, I will shift it on github if people are really interested).
Now to the white board. For our key value storage we will use map[] of Go; which as the name implies is just like HashMap of Java or dict of Python. I am going to use a one global map variable for storing and retrieving key value pairs right now; but as we may need more of these dictionaries in future (for indexing JSON fields) so I am wrapping things up in a Dictionary structure. The dictionary package (file src/dictionary/dictionary.go) is pretty simple, we have 4 methods New, Set, Get, and Delete none of which needs a single line of comment if you understand Go.
Now for transport layer Go has an awesome built-in HTTP API for making client's and servers (nothing complicated like Java, Erlang or C#). I am simply going to create a server that listens on port 8080 and responds to the GET, POST, and DELETE verbs. So by doing a simple curl http://localhost:8080/?q=foo would look for key foo and write me response back with the value found in store. Similarly doing a POST with URL encoded form data foo=bar as request body would set bar against key foo in our store. Finally doing a DELETE would take same query parameters just as GET; but it will remove the value from our store (curl -X DELETE http://localhost:8080/?q=foo removes value against foo). Code for transport part lies in main package under file src/so.sibte/main.go. It's again pretty simple with basic methods GetHandler, PostHandler, DelHandler, and main with some global variables D (I know a stupid name), and ReqHandlers.
You can build project by simply running build.sh included and then run ./main (sorry Windows users no love for you today). Doing curl subsequently would let you play with the server. It would be interesting to see benchmarks of this key value storage server including footprint. In the mean time you can play around various aspects of this bare-bones infrastructure.
Ideas and Take away
Maybe we can introduce a persistence layer via Memory Mapped Files in Go if that doesn't sound attractive LevelDB for Go can come into action as well.
Go's panic, recover, and defer is the exception handling done right.
Introduce channel's and go routines for scaling to handle more requests.
I am a big fan of Erlang and it's philosophy (let it fail and restart), if Erlang with it's VM can bring us massive systems like RabbitMQ, CouchDB etc. Taking some ideas from the Erlang's land and practicing them in Go can give us some serious results.
Make server talk a more efficient protocol (may be use MsgPack, Thrift, or Protobuf)

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
Free to watch • No registration required • HD streaming
The key value store everyone ignored (Postgresql)
Yes I know you are really happy with your "persistent" Key Value store. But did anybody notice hstore that comes along Postgresql. I find Postgresql to be a really great RDBMS that has been ignored all the time. It even has some great publisher/subscriber system as well (or LISTEN/NOTIFY in terms of Postgresql) that a lot of people may have implement using Redis, RabbitMQ etc. For people who have not lived anything other than MySQL. I would simply ask them to try out Postgres.
Instead of looking at benchmarks, I will be focusing on a key value store that is ACID compliant for real! Postgres takes advantage of its storage engine and has an extension on top for key value storage. So plan is to have a table can have a column that has a datatype of hstore; which in turn has a structure free storage. Thinking of this model multiple analogies throw themselves in. It can be a Column Family Store just like Cassandra where row key can be PK of the table, and each column of hstore type in table can be imagined like a super column, and each key in the hstore entry can be a column name. Similarly you can imagine it some what like Hash structures in Redis (HSET, HDEL), or 2 or 3 level MongoDB store (few modifications required). Despite being similar (when little tricks are applied) to your NoSQL store structures, this gives me an opportunity to demonstrate you some really trivial examples.
Lets setup our system first. For my experiment I will be using Postgres 9.1 and I will compile it from source. Once in source directory you can: ./configure && make install to install your Postgres. Don't forget to install the extensions in the contrib directory: cd ./contrib && make install. Once you have setup the database you can create your own database and start the server (Hints: use initdb and pg_ctl). Then launch your psql and make sure you install your hstore extension:
CREATE EXTENSION hstore; SELECT 'foo=>bar'::hstore;
If everything goes well you should be able to see table output. Now we are ready to do some DDL. I created a table my_store as schema definition below:
CREATE TABLE my_store ( id character varying(1024) NOT NULL, doc hstore, CONSTRAINT my_store_pkey PRIMARY KEY (id) ) WITH ( OIDS=FALSE ); CREATE INDEX my_store_doc_idx_gist ON my_store USING gist (doc);
As you can see I've created a table with hstore column type and one GiST index (for operators ? ?& ?| etc.). You can checkout of documentation to have a look on different type of operators you have.
Now that we have database and tables setup I wrote a simple script to populate it with about 115K rows from twitter stream. Now keep in mind that its a real life data and I was interested in querying few basic things from collected data. For example, how many people are putting hash tags, or doing mentions, or were posting links in the tweets? For doing this I wrote a simple python script using tweepy and psycopg2 and ran it for about few hours. For each tweet in my store I added a key value pair of 'has_hashtags=>:t' if there were any hash tags in the tweet, similarly I introduced has_urls and has_mentions if they were present in tweet, I will be using these keys along with my GiST index to query my table later on.
So after populating my data with 115,142 tweets the database grew to a size of 239691780 bytes (Just 228MB). Now comes the fun part. I was totally blown away by what I can achieve by combining the power of relational and key value style under 1 store. So for example I want to query all the tweets tweeted at unix timestamp of 1323446095 (since I stored the timestamps as a string here is what my query looks like):
SELECT doc -> 'text' as tweet, doc -> 'created_at' as created_at FROM my_store WHERE doc @> 'created_at=>00001323446095';
I can add simple count or any other SQL famous aggregate function without going into any complications of my data store specific map reduce or new language to learn hustle. Do note that I padded my timestamp value with zeros since I am only storing strings as values. Also I am utilizing @> operator, thats gonna use the GiST to really do a quick bitmap index scan instead of sequential scan. That was pretty good for starter. Lets try to fetch out all the tweets that had hash tags in them:
SELECT doc -> 'text' as tweet, doc -> 'created_at' as created_at FROM my_store WHERE doc @> 'has_hashtags=>:t';
Yes querying complete database pulling out complete data (That you won't probably do because you page the data :) ) gives me 14689 rows just under 360ms on average. Since we have SQL at hand lets make a condition little more complicated, and use a different operator for same stuff and also sort the data by created_at:
SELECT doc -> 'text' as tweet, doc -> 'created_at' as created_at FROM my_store WHERE doc @> 'has_hashtags=>:t' AND doc ? 'has_urls' ORDER BY doc -> 'created_at' DESC;
It already sounds tasty! This is not it Postgres has more operators, so pulling out hash tagged tweets with urls or mentions is also possible,
SELECT doc -> 'text' as tweet, doc -> 'created_at' as created_at FROM my_store WHERE doc @> 'has_hashtags=>:t' AND doc ?| ARRAY['has_urls', 'has_mentions']
This is not it! hstore comes with all sort of operators and index systems that you can ask for hash store. Check them out here. Now, despite the NoSQL boom I think we have some great examples and reasons of why RDBMS still remains core part of many market giants (Facebook being something everyone knows). Postgres just gives me one more reason to not ignore RDBMS systems, So If you have been moving around on some document stores just because the reason that RDBMS don't provide them; think again! You can get the same rock solid durability with structure free systems.
I will be pretty soon revisiting the FriendFeed use case with MySQL to store structure free data with Postgresql approach. Stay tuned, leave your comments and thoughts.
Update 29th Sept 2012: I've visited the FriendFeed casestudy in my new blog post.
CouchDBのように不定形のデータが扱えるデータベース。Rubyのインターフェースもあり、ActiveRecord的なアクセスも可能のようだ。どんなフィールド(キー名?)にもインデックスが張れるとあるが、日本語の扱いについては試してみるまでわからない。