As some of you know, the Silk team joined Palantir over a year ago. We have maintained Silk as a free service to date. Unfortunately, we have decided to shutdown the service by December 15th, 2017.
Why?
Itâs becoming harder and harder to keep Silk.co running without significant amounts of efforts and maintenance.
Weâve always intended to make Silk reliable, secure and free of bugs. We are no longer able to hold up to this standard.
Many of the third-party services we use to keep Silk.co running have been discontinued, changed or increased their pricing. In addition, security concerns and upcoming changes to legal frameworks for web services increased the liability of running Silk.co without active management.
What will happen next?
Your Silks will remain accessible until December 15th, 2017. After that date, your Silks will no longer be available and Silk.co will go offline.
You will be able to export all data from your Silks from your dashboard. Scroll down to the bottom of the settings page and use the âDownload the .zipâ button to export your entire Silk. You will receive a download link in your email after a few minutes. (Note: if you get a security warning in Chrome, please use another browser such as Firefox or Safari to access your dashboard).
We will maintain a backup of all data for another month, and will delete all data, including user data, no later than January 31, 2018.
Are there alternatives to Silk.co I can use?
While there is no tool that has an equivalent set of features as Silk, we can suggest the following options depending on your use case:
For creating visualizations, have a look at Plot.ly, RAWgraps, or Datawrapper.
For a tool to create dashboards with multiple visualizations, check out Google Data Studio.
The upcoming Coda looks like a promising tool to create interactive spreadsheets with.
Some of Silkâs data storytellers have been providing data cleaning and visualization services through their own agency. If you need their services, please contact [email protected] and weâll connect you with them.
Thank you so much for your support over the years.
Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
â Live Streamingâ Interactive Chatâ Private Showsâ HD Quality
Anya is LIVE right now
FREE
Free to watch ⢠No registration required ⢠HD streaming
Weâre happy to announce today that the Silk team is joining Palantir.
Silk started with the goal of helping people get the most out of their data. Over the last few years, weâve worked relentlessly on a vision to help people structure, query, visualize and share data.
Weâre proud of the product and community we built, and of all the data journalists, activists, NGOs, businesses and many other kinds of people that were able to find important insights and tell great data stories through Silk.
When we met the Palantir team, we realized that we could work on even bigger and more important data problems with an incredibly talented team â even if it meant no longer working on the Silk product. We decided to join Palantir because we believe we can achieve a larger impact there than we could at Silk alone.
What will happen to Silk.co?
Silk.co as a platform will continue to operate. Nothing will change to current Silks, and you can still create a new Silk for free. However, because of our new roles at Palantir, Silk.co will operate âas isâ and we will not be able to provide technical or customer support to new or existing Silk accounts any longer, nor will we be doing any further development work or adding new features to the hosted Silk.co product.
Your data, including the data in your Silks, email addresses, passwords, and any other information will remain confidential and as always, not be shared.
We have immensely enjoyed working on Silk and helping all our users, and are looking forward to the next chapter ahead with our friends and future colleagues at Palantir.
This post on enhancing data is the final in a 3-part series on extracting, cleaning and enhancing data. Be sure to also read part 1 on extracting data, and part 2 on cleaning data.
When you have extracted (or downloaded) and cleaned your data, you might find that you are still missing a few things before you can get a story out of it. In this post, weâll share some of our tricks on how to use the data points in your dataset to extract, calculate and lookup additional information. We'll use Google Sheets in these examples, but Excel works the same, unless we say otherwise.
Letâs say you have a dataset you want to analyze, and it looks like this:
Job Title Job Offer Link Job Compensation Scrape Tags Scrape Job Compensation Equity Marketing/Growth Hacker https://angel.co/liveongo/jobs/130607-marketing-growth-hacker âš300k - âš500k ¡ 0.0 - 0.05% Full Time ¡ Bangalore ¡ Growth Hacker ¡ Branding ¡ Business Development ¡ Business Strategy âš300k - âš500k 0.0 - 0.05%
This is part of a list of job postings - scraped from a jobs database at angel.co - that include the keyword âGrowth Hackerâ. Thereâs a couple of columns with information for each job posting, including the URL, a salary indication, and a column with tags describing the job. On closer inspection you will notice thereâs actually a lot more data hidden in those cells:
For one, the cells with descriptive tags contain multiple terms that are only useful as distinct values.
The dataset has some abbreviated terms (the salary specification) that make them difficult to use.
Also, that column with salary info actually only shows a range, not exact numbers, and theyâre listed in various currencies.
Finally, there is no separate column with the name of the company that posted the job opening, but that name is hidden in the URL of the job posting. So you want to distill that company name and put it in a separate column: the scrape actually included another dataset with background info on a lot of these companies, and so we want to add this information to our dataset.
Now, how do you even begin to enhance your original dataset with all this hidden information? Letâs run by this step by step and see how using simple formulas in Excel or Google Sheets can solve all of the above. If you want to skip ahead to the end result and see what the Silk team created from the above described data, have a look here.
1. Multiple values in one cell
When dealing with multiple values in one cell, you can split ranges with the SPLIT formula.
Example usage: You can use the SPLIT formula to split up all the different descriptive tags that are smooshed together in a single cell. In this case, the contents of one of the cells in column D looked like this: âFull Time ¡ Remote OK ¡ Marketingâ. We can easily split this up on the ¡ symbol. The formula would look like this: =SPLIT(D2,â ¡ â,false).
Similarly, that column with salary information contains a range of numbers and looks like this: â$50k - $58kâ. We can use the SPLIT formula again to get the âminimum salaryâ and âmaximum salaryâ out of the salary range given in cell C2, using =SPLIT(C2," - ",false).
In Excel, use the Text to Columns command.
2. Working with ranges of numbers
You can convert abbreviated values with the SUBSTITUTE formula.
Example usage: As you could note, the salary ranges were expressed in an abbreviated way, such as â$50kâ. Since we want to do some calculations with these numbers, we need the full salary amounts. One of the options then is using =SUBSTITUTE(C2,"k","000"). Apply this formula, and â$50kâ will become â$50,000â.
You can calculate the median of numbers with the MEDIAN formula.
Example usage: We already have the minimum and maximum of the salary range that is offered, split up in cells G2 and H2. So letâs add the median salary that can be calculated from this range, using =MEDIAN(G2:H2).
3. Normalizing multiple currencies
The GOOGLEFINANCE formula is great for working with multiple currencies.
Example usage: Unfortunately, different job postings from our dataset give salary ranges in different currencies. If we want to quickly compare our potential future income, we need to convert them into a single currency. Letâs say the dataset contained salaries specified in Euros and in US Dollars, and we want to normalize them and have a column with all salaries specified in US Dollars.
The GOOGLEFINANCE formula is normally used to fetch current securities information from Google Finance, but it also lets you look up the current conversion rate of two currencies. So we can create a formula that fetches the conversion rate of Euros to Dollars and then multiplies this figure with the salaries that are given in Euros. If cell I2 contains the âmedian salaryâ that we calculated in Euros, we add the following formula to cell J2 =GOOGLEFINANCE(âEURUSDâ)*I2 to get that same median in Dollars.
Unfortunately Excel doesn't have a comparable formula.
4. Comparing and matching data from separate spreadsheets
Look up and match associated data from separate sheets or columns with the VLOOKUP formula
The last thing missing from our wish list is getting additional company information. We donât have the company name yet but we do have the URL of the job posting that contains the company name. So letâs get that name first, by splitting cell B2 containing the URL that looks like https://angel.co/tryfynd/jobs/132608-marketing-lead-growth-hacker. First, we execute =SPLIT(B2,â/jobsâ,false) to get to the URL with more general info on the company (âtryfyndâ in this case). Then we use this result in cell K2 and execute =SPLIT(K2,âangel.co/â,false) to distill just the company name in cell L2.
As mentioned before, we can combine this company name with additional information in another spreadsheet that might be relevant to us. Letâs say weâve been working so far in âSheet 1â and we have a large dataset with data on hundreds of companies in âSheet 2â. We know the companies looking for growth hackers are somewhere in âSheet 2.â In order to match them and add more data to our âSheet 1â, we will need to use the following formula =VLOOKUP(L2,âSheet 2â!A:E,2,false). This formula consists of 4 elements so letâs break it down:
The first part L2 specifies what needs to be looked up - the company name in cell L2 of Sheet 1 in our case.
The second part Sheet 2â!A:E specifies the range of columns where the formula needs to search for that company name - in our case that range contains columns A through E of Sheet 2.
The third part, 2, specifies from which column within the range you want the formula to give you a result. This means that if column A of Sheet 2 contains the company name and column B its year of founding, the VLOOKUP formula we used will give us the matching year of founding from Sheet 2 for every company we have in Sheet 1.
The last part of the formula is a true or false specifier. This element is part of several formulas (including the SPLIT formula). In the case of VLOOKUP it relates to how exact your match will be. By default, the formula will look for the first thing in the range that more or less matches what you are looking for. Often times you just want an exact match so you have to specify this by ending your formula with false.
You can add as many additional columns of data to âSheet 1â as you want by changing the third element of the VLOOKUP formula (changing the number will give you results from different columns within the range).
After applying all of the above formulas, the end result should be a spreadsheet ready for analysis. No need to stop there though: the options described in this post are far from exhaustive. If you want to explore more options see this spreadsheet with the demo dataset of growth hacker job postings. Also make sure to check out Googleâs Spreadsheets function list.
One more thought
Although we have focused on using Google Sheets and Excel in this post, we canât leave out a short mention of alternative tools to enhance data. OpenRefine is very useful for cleaning data (as we covered in our last post), but it can also enhance data with various plugins. The download page shows various extensions for that purpose.
Sometimes we also use APIs to enhance our datasets. Genderize.io for example gives you the gender of a column containing first names, with a probability count (allowing you to ignore results with a low accuracy). We used this for our research on The Gender Gap at the Academy Awards, among other Silks.
We hope you have enjoyed this series on data visualization tools and found it to be useful! If you liked it, follow our Twitter account to chat with us and to get updates on more data-greatness.
This post on cleaning data is the second in a 3-part series on extracting, cleaning and enhancing data. Be sure to also check out part 1 about data extracting and part 3 about data enhancing.
Itâs an inconvenient truth: most data you find on the web is messy and often needs to be thoroughly cleaned. At Silk.co we spend a good amount of time cleaning data and we figured weâd share some of the tricks we picked up along the way!
Google Sheets and Excel tricks
Weâll use Google Sheets in these examples, but Excel usually uses the same formula names (if you are using the English version).
The Find & Replace command is very helpful when cleaning data. Use it to change anything that doesnât look right and follows a predictable pattern. Examples are weird notation symbols, an extra space at the end of each value, etc.
Use the PROPER formula to capitalize each word in a string if needed. Example usage: =PROPER(A2).
Use the CONCATENATE formula to combine the values of 2 columns into one column. Example usage: =CONCATENATE(A2," ",A3)
Convert AM/PM times to HH:MM easily by using =TEXT(A2,"HH:MM").
Use the SPLIT formula to split the contents of a cell into separate columns, based on a common symbol (a comma, for instance). Example usage: =split(A2,", ",false)
Use Format -> Number to make sure your cells with numbers are formatted appropriately. Here you can check things like thousand separators, the way dates are formatted, currencies notations, and more.
Taking care of multiple values: depending on your data visualization tool of choice, you might be able to visualize more than one value per data entry. For this to work, you need to separate the values with a character. A comma usually does the trick, but use another symbol if your values contain commas. Silk.co lets you split on any character on import.
OpenRefine: hardcore cleaning time
OpenRefine is a Java powered data cleaning tool that you can run locally and that works within your web browser. When using spreadsheet formulas to clean data doesnât cut it, OpenRefine is your new best friend.
We recently did a presentation at Growth Tribe where we demonstrated how to clean a dataset of job offers with OpenRefine.
One column in the dataset looked like this, showing the yearly salary and equity together:
Pick Edit Column -> Add Column Based on This Column. Name the column âEquityâ.
Use the following expression: value.split(â ¡ â)[-1]. This takes all values that occur before the â¡â character in the âJob Compensationâ column, and places them in a new column.
Then on the âJob Compensationâ column, pick Edit Cells -> Transform. As the expression, use value.split(â ¡ â)[0]. This removes everything before the â¡â character.
To clean up values without a currency symbol, pick Facet -> Text Facet. Select all cells with no currency symbol. Then pick Edit Cells -> Transform Use the expression: leave empty.
Check out the Silk we made for the presentation to learn more about this particular example, and be sure to also go to OpenRefineâs own website. This is just a small example of what is possible with OpenRefine.
Learn more
A lot of the tips in this post are repurposed from silk-data-handbook.silk.co, a resource for data journalists who work with Silk.co
csvkit is a suite of command line utilities for converting to and working with CSV files. Not for the faint of heart, but very useful for manipulating large datasets, examining datasets, performing SQL like queries, joining multiple csv files, and much more.
There are tons of other great resources on data cleaning and data journalism in general: check out Microsoftâs cleaning guide, The School of Data, and Data + Design, to name a few.
If you liked this post, follow our Twitter account to get updates on the next installment of this post and more data-greatness.
Data Journalism Tools Part 1: Extracting and Scraping Data
This post on extracting data from a website is the first in a 3-part series on extracting, cleaning and enhancing data. Be sure to check out part 2 about data cleaning and part 3 about data enhanching as well.
Some sites already have their data in a neat table, allowing you to easily copy and paste it into a spreadsheet. Wikipedia is a good example of this. Others offer their data for download in a spreadsheet format.
However, itâs not always that easy. If you come across a website with structured information that isnât downloadable or organized in a table, you might have to turn to specialized tools to extract your data. Here are some of the ones we use at Silk.co to create our own data stories.
And the best part? You donât need to know how to code for using the tools listed below, just a bit of patience and a good idea of what youâre looking for. Good luck!
Import.io
Import.io is an amazing tool that lets you extract data from any relatively structured website. Enter a URL on their homepage, and you will be quickly greeted by a structured data table. You can then export the data to a spreadsheet for further cleaning and enhancing.
In the sweet case that no clean up is necessary and you want to build a Silk, you can even skip that last step. Silk actually has a built in data extractor powered by Import.io. Just sign up for a Silk account, hit âExtract data from a websiteâ and enter a URL. Import.io then extracts data from the URL, and Silk turns it into a fully searchable database, with the option to create visualizations and data stories.
Your Browserâs Developer Tools
Sometimes, when you canât copy data over from the web page itself, you can actually copy it from the HTML source. The HTML source is available from your favourite browserâs developer tools.
The âSourceâ tab shows you the pageâs source. This can help you monitor which scripts pull the data and from where. Once you have this information, you can be lucky enough to actually find the URL that directs you to a clean structured data. Still following? Hereâs an example. This interactive visualization on NFL concussion counts appears pretty hard to scrape.
But if you click in âSourcesâ and analyze the code, youâll find this interesting piece of information.
We end up with this link, which contains a structured JSON file, ready to be converted to a spreadsheet file with a tool like OpenRefine (weâll cover OpenRefine and other tools in our next post on cleaning data).
Sometimes, parts of a web page, such as interactive maps, are populated with data retrieved through API calls. A good alternative to scrape this data is to capture this flow of information by clicking on the âNetworkâ tab. Here, youâll be able to monitor the network operations executed by the script used to run the web page. Sorting for bigger sizes and specific types of operation usually helps finding the script which returns the data you need. Then, you copy and paste the results previewed in the âresponseâ tab on the right into a text editor, and end up with a file containing the results of the API call used to retrieve the data you wanted.
The Google Chrome Scraper Extension
Scraper is a Google Chrome extension that converts data from a webpage into a spreadsheet.
After you install the extension, visit the URL you want to scrape. Highlight an instance of the text you would like to convert to a table, right-click and choose âScrape similarâŚâ. You can directly export the results to a Google Sheet, or tweak the Xpath values until you are satisfied with the result. Here is a video tutorial.
Google Sheets' ImportXML Function
One of the most useful things when it comes to harvesting data from the web is learning how to use XPath expressions. XPath is âa query language for selecting nodes from an XMLâ. Knowing how to use XPath to parse web pages will allow endless scraping possibilities, and help you build a spreadsheet from scratch. You can read more here and here.
For less complex scrapers, youâll find that you donât really need a deep knowledge of XPath to use it. For example, you can easily use XPath to construct web crawlers that turn thousands of pages into a structured spreadsheet.
Letâs say I have a list of cities, and I want to pull in information about each from Wikipedia. This is how I would start:
To fill the image column, in cell C2, type:
=importXML(B2,"//td/a/img/@src")
The drag it down for all the other cities.
//td/a/img/@src is the XPath expression to query the url (stored in B1) and return the results contained in that specified path. To understand the exact path of what you need, you can view the source of a webpage and figure it out yourself. Or you can find the object you want (in this case the image), by right clicking on it and selecting âInspect Elementâ. You will then view the page source. Find the url of the image you want, right click and select âCopy XPathâ. You will now have saved on your clipboard the XPath needed to retrieve the image.
You can repeat this to fill out all the other columns. For example, the first paragraph describing a city is accessible through:
=JOIN(" ",importXML(B2, "//div[4]/p[1]"))
To Conclude
Ideally, data will always be organized into neatly downloadable packages, but until that becomes the case, we hope this post helps you find the data you need.. If you donât succeed after your first attempt, please donât be afraid to try a few of the other options listed here. A bit of persistence goes a long way!
If you liked this post, follow our Twitter account to get updates on the next installment of this post and more data-greatness.
Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
â Live Streamingâ Interactive Chatâ Private Showsâ HD Quality
Anya is LIVE right now
FREE
Free to watch ⢠No registration required ⢠HD streaming
Latest Data Reveals the Actual Size of The Gender Gap at the Academy Awards
A data-driven analysis of gender representation in 88 years of Oscars
Over the 88 editions of the Academy Awards, several things have remained constant. The Red Carpet. The golden statuette. And the absence of women nominees.
Which isnât to say that women are absent from the event. During the Academy Awards season, pictures of actresses monopolize the front pages of websites, magazines, newspapers, and TV shows. Their bodies are discussed, their designer clothes, their hairstyles, their make-up, their dates.
And still, very few women are actually nominated for an Oscar. And even less win one.
Letâs look at numbers. Did you know that in the history of the awards, only 17% of the nominees have been women? Or that in the upcoming 2016 Oscars, five categories had no female nominee at all? In this Silk, I explored the data behind the Gender Gap at the prestigious Academy Awards. Read the findings below and follow the original Silk resource for the next updates.
The 2016 Edition: 26% of the awards have an all-male nominee list
Excluding acting awards, there are no female candidates in 5 categories of the remaining 19 categories. And we already know for sure that no women will stand a chance of winning a prestigious statuette for Best Cinematography, Best Director, Best Music Score, Best Sound Editing or Best Sound Mixing.
On the other hand, we are probably going to see some slightly sexist stereotypes confirmed, when the Costume Design statuette is assigned: 80% of this yearâs nominees in the field are women. The only category out of the 19 where women are more often represented than men.
See the footer of this page for a note on what âWeightedâ means. In short: âFemale nomineesâ actually refers to âfemale share of award nominationsâ. One solo female nomination in one award counts as one, one female nomination in an award shared with two guys will count as 0.33. If you prefer to have a look at the absolute numbers, you can click âExploreâ and choose to plot the unweighted variables.
Details on Gender of Nominees at the 2016 Academy Awards
See the footer of this page for a note on what âWeightedâ means.
The good news: This year marks an all-time record for the share of female nominees
For the 1929â2016 period, the female award nomination share is 17%, a depressing 65 percentage points lower than the male share. Luckily, though, things seem to be slowly getting better. Despite constant lows, the share of female nominations has never dropped below this historic average since 1982. And this year, the 51 nominated women hold a 27.5% share of the nomination pot: the highest ever recorded!
Female and Male share of the nomination list
See the footer of this page for a note on what âWeightedâ means.
The bad news #1: at this rate, it will take 118 years to close the Oscarâs gender gap
The first Oscar edition of 1929 started with a 10.7% share of female nominations. This current one, with 27.5%, improved the ratio by an impressive 16.8 percentage point. But it took 88 ceremonies.
As we saw before, the percentage of female nominees fluctuated a lot throughout the years, and itâs hard to make predictions. But when we take these numbers to calculate the average per-year improvement, we get that female nominations grew on average by 0.19 percentage points per year, between 1929 and 2016.
Wondering what this actually means? Well, that youâll probably never live to see a gender balanced Academy Award ceremony. At this rate, it will take 118 years before we reach an equally distributed rate of male and female nominations. So stay tuned for the 2134 Oscars!
We could assume that womenâs representation at the Oscars is moving at a faster and faster pace from the â30s. And, therefore, that this gender gap will close sooner than predicted. But the per-decade average share of female nominations doesnât confirm this optimistic assumption. Quite the opposite, actually.
The percentage of female nominations grew especially during the 1970s and â80s (can we thank second-wave feminism for that?). In the last decades, however, the average relative growth has almost stalled. Women made up 22.5% of the nominees between 2000 and 2009. And 22.7% between 2010 and 2016. Not much improvement, overall. Actually, this is the slowest growth rate ever recorded, if we exclude the all-time low records registered during the â40s. Letâs hope for a booming crew of ladies to steal the stage in the last editions of the 2010sâ Oscars!
See the footer of this page for a note on what âWeightedâ means.
The bad news #2: Oscar victories are even more likely to be male dominated than the nominee listings
So the Academy seems to be making an effort to nominate more female candidates. But the number of female winners hints that this might be more of a facade operation, rather than a deep structural change in its preference for male candidates.
Between 1929â2015 women got on average 13% of the win share every year. And thereâs no sign of that (slow) growth that we saw when looking at female nominations. Just three years ago, the share of female wins was below the historical average. And in 2009 it was still just 10%. In other words, over the 88 editions, men held on average the 87% of win share each year.
Female and Male share of statuette wins
See the footer of this page for a note on what âWeightedâ means.
The current growth rate of female wins is of 0.08 percentage points per year. (With all the caveats applied when predicting the first gender balance nominee list). This suggests that women will be equally represented among statuettes holders in almost half a millennium! (436 years, to be precise).
This value, is of course just speculative, and we donât know what the gender gap improvement rate will look like for statuette winners. We do know that, while the share of female nominees grew through the last decades, the same isnât equally true for Oscar wins.
See the footer of this page for a note on what âWeightedâ means.
What women (donât) win: Disaggregating the Oscar gender gap
The overall numbers on female nominees through time hide another gender inequality. Women are represented poorly in nearly all categories of the Awards. Yet, the Oscar gender divide is particularly pronounced in certain fields more than in others. To the point that, in some categories, a woman has never been nominated for an Oscar. Let alone won one.
Two categories stand out for womenâs ratio of Oscar nominations and wins. Interestingly enough, theyâre both more traditionally associated with female roles and stereotypes: Costume Design and Makeup & Hairstyling. Costume Design is the only category in which women surpass men both among nominees and winners.
Documentaries, both shorts and feature films, are also a good way for women to access a statuette. But other than this, in all other categories, women are below 16% of the nominees and winners.
See the footer of this page for a note on what âWeightedâ means.
See the footer of this page for a note on what âWeightedâ means.
The previous charts show how the Oscar gender gap articulates in different fields. And that this situation is incredibly unequal in some fields in particular.
Some examples:
Since the Animated Feature Film category has been instituted in 2001, it is only in 2013 that a woman finally managed to win an award: Brenda Chapman for Brave. The following year, Jennifer Lee joined the club thanks to Frozen. (Note: both of them shared the prize with their otherwise all-male team ).
Among Cinematography nominees: No wonder there has never been a female winner. So far, weâve seen 88 editions and a total of 638 candidates. And not a single female nominee.
Women are also scarce among Directors, Assistant Directors and Special, Visual or Engineering Effects.
For Directing only one woman has won: Kathryn Bigelow for The Hurt Locker in 2009. In this category, the Academy has only nominated four women out of 438 candidates: Lina Wertmuller (1976), Jane Campion (1993), Sofia Coppola (2003) and Bigelow.
For Special/Visual/Engineering Effects only seven female candidates have been nominated out of 681 total nominees. In other words, men comprise 99% of candidates. Only three women have won this category since 1929: Vivian Greenham for The Guns of Navarone (1961), Suzanne Benson for Aliens (1986) and Janek Sirrs for The Matrix (1999).
What we learn by disaggregating the gender gap by category is that some fields are unreasonably more gender unbalanced than others. And that the path to a more equal Academy Award ceremony is more complicated than a sole, indiscriminate, numeric boost of female nominations. It needs to address why women donât actually win a statuette as much as men. And most importantly, the selection of Oscar female talents needs be more diversified across the different artistic fields.
See the footer of this page for a note on what âWeightedâ means.
Notes
To get all the single nominations, more than 16,000 of them, I crawled the AMPAS database, which is my source for this piece. This Oscar dataset is pretty massive and didnât come straight out as it is, in a machine readable and clean format. So you might want to read the notes and methodology section for information about how the dataset was built and structured. Iâm open for criticism, debate and questions, so drop me a line at alice [at] silk.co (I also still have conflicting thoughts about whether to look at absolute or weighted percentages: please share your thoughts on this! I went for weighted percentages, but, for transparency, I kept the absolute number counts in the data as well, so you can edit all the charts to plot these instead).
Note on weight:Â Each nomination can have more than one nominee (For example: Scientific And Engineering Award of 2014 went âto EMMANUEL PRĂVINAIRE, JAN SPERLING, ETIENNE BRANDT and TONY POSTIAU for their development of the Flying-Cam SARAH 3.0 system.â). To distinguish nominations with multiple nominees from those with single nominees, I weighted each nominee by the number of other nominees that shared the award with him/her. So, for example, the datapoints referring to EMMANUEL PRĂVINAIRE, JAN SPERLING, ETIENNE BRANDT and TONY POSTIAU each have a âNomination Weightâ of 0.25.
When calculating the % of female and male nominees per year or award category, I made both a calculation using the raw numbers (for example: # female nominees out of the total nominees) and a weighted one (for example: sum of ânomination weightâ of female nominees, divided by total sum of ânomination weightâ for male and female nominees).
On this note: obviously, only people were counted when calculating % of female and male nominees. Therefore, if a nomination has, for example, as nominees: MGM, Marilyn Monroe and James Dean, each of the three will have a ânomination weightâ of 0.33. However only two were added up to calculate male/female ratios (total: 0.66).
Silk + Google Sheets Sync: Automatically Updated Data Visualizations Every Hour
What if you could, say, automatically collect information on journalism meetups around the world and map the next events on a Silk site?
Well, now you can. As of today, our brand new Google Sheets Sync will let you link your Silk to your Sheet, and interactive visualizations will automatically update to reflect any changes. Once you set up the sync, you will never have to manually re-import a spreadsheet in your Silk ever again.
This means that hundreds of millions of Sheets users can now transform their static spreadsheets into dynamically updated maps, charts, galleries, tables, and full data websites in minutes. And as a Google Sheet is updated over time, the linked Silk will automatically update all visualizations, pages, and datacards to reflect the new changes.
Use powerful formulas to update your Silk automatically
Google Sheets can pull in data from other places using powerful formulas. Google Sheets' ImportXML formula is one of them. Alice scraped information from Meetup.com by using the ImportXML formula to track stats on all the journalism meetups around the world, and mapping next global events (Meetup doesnât let you do any of this!). That way, her Silk journalism-meetups.silk.co is always updated with the latest journalism meetups. Learn how to scrape in Google Sheets.
Wait, thereâs more
Connect a Google Form to your Silk: Want people to submit data to your Silk without giving them complete access to your Silk? Just connect a Google Form to your Google Sheet. This is great for crowdsourced Silks. Learn how.
Use automation tools to build an automatically updated Silk: Use tools like IFTTT and Zapier to automatically pull data from various sources into your Silk automatically. For example, Iâve always wanted to archive photos taken around Silk HQâs street. Using an IFTTT Instagram recipe and Google Sheets Sync, itâs possible! Learn how.
Want to try it out yourself?
Use âImport Google Sheetâ button on your dashboard like you normally would.
Weâre happy to announce weâve redesigned our filters! Adding a filter is a more streamlined experience now, and weâve added features to give you more control over how your filters work.
Here is how the filters look on a live embed from world.silk.co:
Go to your Silk to use the filters on your own data, or read more about the changes below.
Easy access
Weâve moved the âAdd Filterâ button out into the open. Itâs now available directly above every visualization on your Silk.
Range filters
Text can be filtered by typing in a value, or by selecting values that occur often. For numbers, you can select a range to filter on.
Show filters to the people who visit your Silk
You can use the âConfigureâ link in the filter to set its visibility.
Check out the help article to see all available options.
Silk for Growth Hacking: Teaching the Growth Tribe Class
We recently posted on how recruiting hackers are using Silk to beat data overload. Last week taught a room full of growth hackers how Silk can help them organize, visualize and analyze data. The class was organized by Growth Tribe, a growth hacking agency in Amsterdam. Fittingly, the class covered how growth hackers could hack growth hacking jobs and pull in information about where those jobs are, what the pay is, and other details.
Jurian Baas, our head of customer success and Alice Corona, one of Silk's data journalists, built a comprehensive tutorial for the event. Alice covered data scraping with XPath and some Chrome extensions, data cleanup using Open Refine, and then, finally visualization and analysis in Silk.
Alice created a repeatable and relatively simple set of process to monitor the growth hacker job market to see whoâs hiring and to discover employment hubs, most requested skills, average salary and most common job benefits. Of course, this isn't just useful for growth hacker jobs. You could insert any other job description and follow the same instructions to create your own Silk research tool for jobs for data scientists, Java developers, HR specialists, or anything else.
The end result is what you see below. Pretty cool!
Data from Growth Hacker Jobs
http://growth-hacker-jobs.silk.co/
The attendees apparently liked what we taught them (always gratifying).
You don't have to go to Amsterdam for this, either. Just try the tutorial and hack your own job research. Let us know how you do! And thanks to Growth Tribe for having us in. We had a blast.
Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
â Live Streamingâ Interactive Chatâ Private Showsâ HD Quality
Anya is LIVE right now
FREE
Free to watch ⢠No registration required ⢠HD streaming
Brands want more control over the look and feel of their Silks. We get that and we designed Silk from the beginning with those needs in mind.
Just before New Years we launched Silk for Nonprofits with help from TechSoup. A key part of that launch was unveiling TechSoup Impact Stories, a Silk that shares the stories of TechSoup technology grant recipients with maps, data and visualizations of 900+ nonprofits. This was also the first Silk with our new Silk for Brands offering.
Data from TechSoup Impact Stories
Introducing Silk for Brands
Silk for Brands, our premium package, includes:
5000 datacards
if needed, a private, password protectable Silk
brand logos on every page
a custom domain name
custom fonts for text blocks
custom header colors to match brand colors
Brands using the package still get all the benefits of Silk. All other aspects of the user experience are the same as for regular silk.co accounts. Silk for Brands customers will continue to be hosted on our secure, reliable platform.
What TechSoup and other brands we are speaking to have told is that using Silk as a platform for branded data-driven content that non-technical teams can build and manage, quickly and easily. That Silk âjust worksâ on mobile and generates shareable embeds, too, is a major plus. You should first explore Silk as a platform for storytelling and building interactive data visualizations.
If it feels right for your brand, weâre more than happy to talk about possibilities and pricing. Thanks and we look forward to your feedback.
One of our favorite parts of building Silk is seeing all the really cool things that people create with our platform in nonprofits, data journalism, with maps, and more. Rather than write a long-winded post, Iâm just going to list out some of our favorite Silks for the past year. Check them out, learn from them and enjoy. Happy New Year and to many more Silks in 2016!
Where Do Americans Live?
This Silk by Lyman Stone contains data on U.S. interstate migration that illuminates surprising trends. Lyman weaves data and text beautifully. He also published his Silk visualizations in Medium!
Data from historicpopulation.silk.co
O Que Estamos Fazendo Com Nossas IPAs?
The title of this Silk in Portuguese is "What are we doing with our IPAs?" and its all about the nascent and quite frothy Brazilian IPA brewing business. It has lots of pictures, maps and data and paints a joyful story of who's making this type of suds in Brazil.
From quanto-vivem-as-ipas-nacionais.silk.co
Restaurant Violations in Tampa Florida
This data journalism Silk is made by students at the University of Tampa with public data on restaurant health code violations in Tampa, Florida.
From Tampa Bay Restaurant Data
Prohaska Consulting
We love this Silk because it shows how a consulting company can effectively turn a spreadsheet into a Website that has maps, employee directories, grouping by area of expertise and more information far more quickly than building something on other platforms.
SMEX Digital Rights Datasets
This is actually a unique dataset on digital rights in the Arab countries created by the nonprofit SMEX. It incorporates videos, slides, charts and more to create a powerful research and advocacy tool.
TechSoup Impact Stories
TechSoup is the largest provider of free and low-cost technology consulting and software in the world. They created a Silk datacard with image for over 900 success stories and plotted impact on a map.
30 Mhz (Sensie)
Sensie used Silk as a way to highlight interesting use cases for their Internet-of-Things sensor software and platform.
Vermont's Lake Champlain Cleanup Plan, Explained
This is a project by data journalists and consultant Hilary Niles created with Vermont Public Radio. It uses Silk as a rich storytelling canvas to explain with charts, maps and numbers the data behind cleaning up this iconic lake in Vermont.
This is just a small sampling. We can't wait to show you the amazing Silks that we know will be built in the coming year.
Silk as Solution to Talent Sourcing Overload: Guest Blog
Editors Note: This blog post was originally published by Silk user Aaron Lintz on SourceCon. Aaron is a Sr. Talent Sourcer at CommVault.
Lately Iâve been testing some new tools to help better understand and visualize the often large amounts of data that I use for sourcing talent and new hires. I live in spreadsheets. While I can easily pivot, slice, and plot data in Excel, sometimes I need a purpose-built data tool to see the underlying context of the data. After testing enterprise-level platforms that were overkill for my needs, I found Silk.co and I wanted to share the joy.
Silk.co bills itself as a place to publish data with interactive visualizations. This free tool was originally marketed as a great way for reporters and causes to share datasets. Unlike Tableau ($500â999 per user), Silk is currently free with the option for private, collaborative projects for teams. (Ed. Note - We will be charging soon). Silkâs team has also included user rights management and good privacy settings all in a platform with a lighter learning curve.
As an example, I created this public Silk project with members from a Docker Meetup group in New York City. Iâve always had issues dealing with Meetup groups. Curioiusly, location is not actually a requirement to join a group.
First, I created an image mosaic with thumbnail image links and data that I pulled using Meetupâs API via another tool I love, Blockspring. Itâs possible to use the filters on the mosaic to dig into the dataset. Some datacards donât have images because the users did not publish images on their Meetup profile. Here I filtered only for profiles with images.
Data from docker.silk.co
Clicking on any profile picture will bring up their âDatacardâ. Each datacard is equivalent to a row on a spreadsheet. Each datacard is also a standalone Webpage. In my Silk, a datacard contains their user name with their profile link, Twitter ID (if listed in profile), total number of groups they are member of, and other topics of interests. Here is Harshilâs public meetup profile for reference. I can also easily convert the Twitter handle into a live Twitter stream on the page. This can be useful for providing additional personal context when reaching out to potential candidates.
Back on the homepage below the Mosaic, I created a map using Googleâs natural language system. Silk uses Google Maps as its mapping engine. One of the data elements that the Meetup user profile contains is city/state combination. While Meetup does have longitude and latitude data in the API, Google Maps generates a lat-long pairing from the same city/state data the user enters.
You can also pick which data points are shown in the mini-datacard preview when you click on each pin. If you want, you can even display images on the pins. If you have more than one person per location pin because they have the same location on their profile, you can drill down into the datacards without leaving the page. Silk maps and all other visualizations also work well on mobile devices.
Mapping proved to be the most useful tool for my sourcing needs. I was able to hone in on the people who live nearby and compare that to their hometown. Any good recruiter will tell you that someone who has never left home is unlikely to move, even for the ideal position. Context from their interests and biography helped me craft personalized outreach messages.
Near the bottom of the page I added a column chart to show usersâ membership counts to get a sense of their activity on Meetup. Then, I added a few easy-to-embed multimedia from other sources to help illustrate how Silk can be used in other ways. You can embed audio, video, slides, PDF docs, and images on any Silk.
Finally, I decided to play with the interests tags some more by creating a new page. The homepage was growing too long. Silk lets you create multiple pages, just like WordPress or Squarespace. Pages in a Silk are different than datacards. Datacards contain an embedded data table. Pages donât have data tables. They are more like freehand canvases. All the elements on a page or a datacard, though, can be dragged and dropped around to customize your layout.
Using the same dataset, I grouped names by interest only and applied additional filters. Here I have filtered only group members interested in Openstack who live in New Jersey. I didnât want to complicate this example so I can only select one specific topic at a time. You will see some duplicate names showing results sorted by their other Interests.
Data from docker.silk.co
Iâve already found several uses for Silk.co in my sourcing workflow. Their team has been responsive to my questions. Their technology roadmap includes direct integration with Import.io (Ed. Note - Weâre happy to announce we launced this recently) and automatic updates from Google Sheets. Anyone having a difficult time dealing with large spreadsheets should take Silk for a test drive before investing in more complex solutions to data overload.
We are very pleased to announce the launch of Silk for Nonprofits. What does this mean? For nonprofits that want to publish data and stories on Silk, we now offer:
Free consultations with our data team on how to structure your data
Free help with setup and creation of your Silk
A free private Silk for internal usage and projects
Discounts on custom branding and custom domains
Marketing assistance and social media support from Silkâs marketing team
Learn About Silk for Nonprofits
TechSoup, the leading technology resource organization for the nonprofit world, is supporting our launch, distributing news of Silk for Nonprofits through its newsletters, blogs, forums and social channels to nonprofit, nongovernment organization around the world.
Aside from having more than 1 million members, TechSoup is also one of Silkâs most inspiring customers. This year they started using Silk to map, categorize and communicate the impact stories of TechSoup clients and technology aid recipients including:
All The Children Are Children, which operates a school for 60 students in Guyotin-Coco, one of the poorest rural communities in Haiti
The Kosch-Westerman Foundation, which connects terminally-ill children to the their classmates and to the outside world
Boulder Food and Rescue, which redistributes food that would otherwise be discarded to needy people
Building Silks From Spreadsheets With No Code, Minimal Tech Skills
TechSoup was able to build this extensive Silk with nearly 1,000 pages from a spreadsheet. No coding or advanced spreadsheet skills were required. You can view some of the TechSoup impact stories below in the mosaic.
Data from impact.techsoup.org
âBased on our own overwhelmingly positive experience, we are happy to share Silkâs offer with the nonprofit, nongovernmental organizations we serve around the world,â says Chris Worman, Senior Director, Alliances & Community Engagement at TechSoup. âVisualizing data in compelling ways is vitally important to the sector as it seeks to both show its impact and raise awareness. Silk lowers barriers to entry in this space and helps even the least technical amongst us turn spreadsheets into impact reports, project maps, galleries and more."
Why We Launching Silk for Nonprofits Now
At Silk, weâve seen a large spike in the number of nonprofits using our platform for building aid maps, impact reports, resource directories, and social content. Human Rights Watch, Forest Ethics and The Metropolitan Museum of Art are just a few of our most noted users. We also love the fact that smaller, local, and regional nonprofits like Bay Area LISC in Oakland, California, and Social Media Exchange in Beirut, Lebanon, are also using Silk. We get regular inquiries. So wanted to formalize a program to serve the millions of nonprofits.
This year, like none in recent memory, nonprofits have faced tremendous demands for their services. The influx of millions of refugees in Europe, the global rise of extreme weather, and other unforeseen events conspired to make 2015 a busy year for social good organizations. At the same time, the funding environment for nonprofits remains challenging. More than ever, they need simple-to-use, cost effective (or free) ways to communicate their stories and share their data.
Nonprofits also often have very interesting data. The dataset on digital rights laws in the Arab world built by SMEX is unique. Non-profit media customers like Bellingcat not only collect but also verify and augment data on Russian airstrikes in Syria, creating another unique resource. Nonprofits that collect amazing data generally lack the resources to develop custom websites, maintain databases, or create complex data visualizations.
Silk covers all those capabilities nicely and allows nonprofits to build complete projects in a matter of hours that might take a full-time developer days or weeks. If you are at a nonprofit, or know one that could use our help, please check out the program page and send us an email. We look forward to helping many more nonprofits publish their data and advance their missions in the world. Happy holidays and happy data publishing from Silk. Thanks for reading.
Silk + Import.io: Instantly Transform Web Pages into Data Visualizations
Recognize the page in the image below? Of course you do. This is the Apple Watch page from Apple.com. Itâs a beautiful page. But it is not just that. Underneath lies a table of data that looks a lot like a spreadsheet. Each table row contains information on a watch model. Each table column contains information on a specific aspect of the watches, such as price, product descriptions, or a link to an image.
What if you want to use that data trapped in the Apple carousel to build your own gallery or a chart, or a sortable table of Apple Watch data? Thanks to Silk and Import.io's new partnership now you can... in only a few seconds. All you need to do is paste the URL into the âExtract data from Websiteâ box in your Silk dashboard. You donât need to know any code. There is no plug-in. It just works.
Sounds too good to be true? Then check out the video below to see the new Silk-Import.io seamless data extraction tool in action on a live Apple Watch product detail page.
And, just like you saw in the video, here is all the watch data in Silk gallery. You can click through to see all the data on the Silk datacards. You can even build your own visualizations - charts, galleries, groups or mosaics from the data. Or you can try it yourself. It literally takes a minute. (If you are not a Silk user, you can sign up for free and extract data from the Apple Watch page or millions of other pages, too).
OK, we hope this blew your mind. Now, weâll answer some obvious questions.
So how does this really work? Magic, quite literally. Silk has integrated with Import.ioâs Magic API. Magic is the data extraction engine that looks at the contents of a Web page and automatically identifies structured data. When a Silk user pastes a URL into the âExtract from Websiteâ box on their Dashboard, Silk sends that URL over to Magic. Magic extracts the structured data from the page and converts it into a table format. Magic then sends the table back to Silk. Silk imports the table data and converts that data into a Silk site.
Cool, Apple Watch. What other pages can I use this on? This isnât a one-trick wonder. Want to convert the CB Insights Unicorn Tracker into a live database and analytics platforms for unicorns? Cut, paste, visualize, analyze. How about San Francisco real estate listings on Zillow? In fact, there are millions of pages on the Internet that are either entirely well structured in a table format or have portions that are well structured.
How about pagination? The Silk-Import.io integration supports limited pagination. Basically, weâll look for pagination from Magic and automatically import the first 5 pages of data. We anticipate this should handle most the data extraction requests. If users demand deeper pagination, we expect adding that shouldnât be difficult.
Will this work on all Web pages? Unfortunately, no. The Silk + Import.io Magic data extraction process only works on Web pages that have a clean, recognizable table structure. This could be a table hidden in a carousel (like the example above), a table lurking behind a map, or a table that is disguised as an image grid. But without clean structure, the extraction doesnât work. Also, Web pages that are heavy on JavaScript or are dynamically served will probably not work. Still, this leaves millions of Web pages that will work very well and allow for clean data extraction. For ideas on more of Web sites that import fairly cleanly into Silk, check out our Extract URLs gallery (published as a Silk, naturally).
Want to try it yourself?
Sign up for Silk.
Paste a URL into the âExtract from Websiteâ box in your dashboard.
The data from that URL is now in Silk! Publish the visualizations on your Silk home page or on your blog.
Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
â Live Streamingâ Interactive Chatâ Private Showsâ HD Quality
Anya is LIVE right now
FREE
Free to watch ⢠No registration required ⢠HD streaming
Fall Cleaning: A Better Interface, and More Improvements
After shipping big features like our new layout options and dynamic datacard templates, we thought it was time for some fall cleaning. Let me show you how a series of small but much requested tweaks and a bit of new paint makes for a nicer Silk.
A more sophisticated user interface
To help your readers focus more easily on the content in the body of your Silk, we removed the bright colors in the navigation and added spacing around the widgets and menus. We think this layout is cleaner and easier on the eyes.
Improvements to visualizations
Silk got smarter when switching visualizations through the explore tab. When selecting a visualization, it looks at your data and selects the tags that make sense. For example, Silk will automatically recognize location data for maps and numerical data for charts. So when we create a column chart on solarsystem.silk.co, Silk automatically creates a chart plotting the mass of all planets:
Improvements to the dashboard
The dashboard now loads more quickly than before
Inviting someone to your Silk is easier
Accessing settings is also easier
There are more ways to delete pages
You can now double click on a collectionâs name to rename it.
Various other improvements
External links now open in new tabs by popular request.
The carousel that appears when you click on the Explore, Datacards, or Pages tab has a smoother scroll action. Scroll through the carousel horizontally, and the pages will fly by. This makes it easier to navigate through your datacards.
We introduced an in-app help option. You can reach it by clicking the âGet Helpâ button in the footer on the bottom-right of the screen. This means that you can ask us anything from inside Silk.co.
How You Can Create Interactive Presentations with Data Visualizations in 4 Easy Steps
This is a guest post by Edouard from Bunkr, a presentation tool that displays any online content.
Having data on your presentation can be the difference between engaging with your audience and losing them after a slide. Whether itâs a map or a chart, data is an important form of content to not only engage with your audience, but also to strengthen your points. Unfortunately, most data publishing platforms are paid services, which is why I was incredibly pleased to discover Silk, and integrate it seamlessly with our tool, Bunkr.
We created Bunkr for professionals who want to present their work with a single click. The presentations are completely web-based, displaying all the content you create that are kept on the cloud. Bunkr is unlike any other presentation tool allowing you to effortlessly create and update elegant presentations. Take a look at this example: bunkrapp.com/present/tzms2y
With Silk, weâve found it incredibly easy to create interactive visualizations, and then publish them on Bunkr. And the more data we threw on, the better and the more interesting the Silks became. Silk transformed numbers into charts, locations into maps, and images into galleries. And the best part? Silks are completely embeddable and can add more life to your Bunkr presentations.
So, now that youâve created some cool visualizations using Silk, what next? Adding them to Bunkr, in 4 easy steps.
Find the Silk visualization that you would like to share and click on Share & Embed
Click on Embed and click on Copy to Clipboard
Go back to Bunkr and click on the More button
Paste the embed code in the box and click on the Add button
Here is how a Bunkr presentation with embedded Silk visualizations looks like:
Below is a video that shows the process of making it step-by-step: