S

**1 ** Upvotes

Introduction to Linear Regression - Predicting Bike Usage in MadridRead more

Start to Finish Data Science with Anaconda
Anaconda 4.3, early 2017Read more

Basic UMAP Usage and Parameters
UMAP is a fairly flexible non-linear dimension reduction algorithm. It seeks to learn the manifold structure of your data and find a low dimensional embedding that preserves the essential topological structure of that manifold. In this notebook we will generate some visualisable 4-dimensional data, demonstrate how to use UMAP to provide a 2-dimensional representation of it, and then look at how various UMAP parameters can impact the resulting embedding. This notebook is based on the work of Philippe Rivière for visionscarto.net.
To start we'll need some basic libraries. First numpy will be needed for basic array manipulation. Since we will be visualising the results we will need matplotlib and seaborn. Finally we will need umap for doing the dimension reduction itself.Read more

Tutorial based on the No bullshit guide series of textbooks by Ivan SavovRead more

Number of data-science related repos on Github by year
Search API docsRead more

We'll cut down in the size of the DataFrame we're working with by throwing out weapon data for weapons with indices over three (which are really only present on ships anyway).Read more

Results from arduino radiation-board
Revived a forgotten Libelium/Cooking-hacks radiation arduino shield.
The same board + J305βγ tube had produced these results ~5 years back: http://ankostis.blogspot.it/2011/07/blog-post.html
github: https://github.com/ankostis/radiationsensorRead more

Functional preference profiles
Note: This analysis is almost identical to this one:
https://github.com/adelavega/neurosynth-lfc/blob/master/Functional preference profiles.ipynb
Here, I'll take advantage of Neurosynth's semantic data to assuming function to each sub-component of the default network.
For each region in the clustering analysis, we're going to determine how well we can classify studies that activated the region, versus those that did not, on the basis of latent topics describing the psychological states in each study.Read more

Require AmberTools
conda install ambertools=17 -c http://ambermd.org/downloads/ambertools/conda/Read more

Evaluating Burgers Equation with different CFD Schemes
We've already examined Burgers equation in both 1D and 2D (Step 4 and Step 8, respectively). Here we are going to restrict ourselves to the 1D Burgers equation and examine the role that different schemes play in discretizing non-linear 1st order hyperbolic equations.Read more

Author: Jake Vanderplas: http://jakevdp.github.io
This is an example of embedding an animation via javascript into an IPython notebook.
It requires the JSAnimation import, available at
http://github.com/jakevdp/JSAnimation.
The animation widget makes use of the HTML5 slider element, which is not yet supported by
Firefox and some other browsers. For a comprehensive list of browser support for
this element, see http://caniuse.com/input-range.
This notebook contains a simple animation example which is displayed inline using the
JSAnimation IPython_display plugin.Read more

Probabilistic Algorithms: Approximate Counting, *LogLog and Bloom FiltersRead more

SEGMENT GPS Analysis
Notes
UTM 36North
EPSG:32636Read more

Step 1 - Model Training
Now that we have a feel for the data we are dealing with, we can start designing our model. In this notebook, we will define the network architecture and train the model. We will also discuss some of the transformations on the data in response to the observations that we made in the data exploration section of the notebook.
Let us start by importing some libraries.Read more

To run all cells: Cell -> Run All
To run a selected cell: First click on it, then hit Ctrl + EnterRead more

Table of Contents
1 ——WESTERN TRADE COAST HYDROLOGIC MODEL ——2 IMPORT DEPENDENCIES3 THE STUDY AREA3.1 Leaflet slippy map3.2 Set map extents3.3 Initial Cartopy plot of the study area4 FINITE DIFFERENCE GRID4.1 Helper functions to construct a gradually-refined grid4.2 Plot grid on leaflet map (uses mplleaflet)5 NUMERICAL MODEL5.1 Create workspace directory, set path to MODFLOW executable, model name, etc.5.2 DIS package (spatial and temporal discretisation)5.2.1 STEADY STATE5.2.1.1 Temporal discretisation5.2.1.2 Initial spatial discretisation5.2.1.3 Create DIS package object5.2.2 TRANSIENT5.2.2.1 Temporal discretisation5.2.3 LAYER ELEVATIONS5.2.3.1 Set spatial reference for model grid5.2.3.2 Model top elevations (DEM)5.2.3.3 Superficial aquifer — bottom elevations5.2.3.4 Create final DIS package object5.3 BAS package (boundary conditions)5.3.1 For clarity, reorganise the order of columns in the MODFLOW geodataframe5.4 GHB package (general head boundaries)5.5 LPF package (hydraulic properties)5.5.1 Hydraulic conductivity zones5.5.2 Homogeneous HK (one zone)5.5.3 Homogeneous HK (multiple zones)5.5.4 Heterogeneous HK (pilot points + Radial Basis Functions)5.5.5 Heterogeneous HK (pilot points + KNN regressor)5.6 EVT package (evapotranspiration)5.6.1 STEADY STATE5.6.1.1 Set parameters5.6.1.2 Create EVT package object5.6.2 TRANSIENT5.6.2.1 Read SILO historical records5.6.2.2 Display complete record5.6.2.3 pre-1970 record5.6.2.4 Change units (to L/T)5.6.2.5 Create EVT package object5.7 DRN package (drains)5.8 WEL package (pumping)5.9 RCH package (recharge)5.9.1 STEADY STATE5.9.1.1 Create RCH package object5.9.2 TRANSIENT5.10 SWI2 Package (seawater intrusion)5.11 LAK package (main wetlands, MAR infiltration basins)5.12 PCG and OC packages (solver and output control)5.13 Write MODFLOW input files and run model5.14 Plot heads and DTWT using ModelMap6 CALIBRATION6.1 Calibration targets (GWL observations)6.1.1 Load pickle6.1.2 Drop duplicates6.1.3 Quality check drilling depth of calibration target bores6.1.4 Drop records with drilling depth likely deeper than superficial aquifer7 FORECASTING SIMULATIONS8 SANDBOX8.1 Compute drain length in cell9 HELPER FUNCTIONS9.1 Land Use9.1.1 reclassify_LU_MRS9.1.2 BATCH_reclassify_LU_MRS9.1.3 plot_reclassify_LU_MRS9.1.4 LUraster_to_MODFLOW9.1.5 plot_raster_to_MODFLOW9.1.6 plot_raster_to_reclass_to_MODFLOWRead more

Quick PCA visualization (not used later)Read more

A Jupyter notebook to read tracking files and (ultimately) create a data package representation of theseRead more

WordCloud of tweets hashtags on blockchain topicRead more

CubedSphere Basic design: data structure and functionalitiesRead more

Zika and Microcephaly in Brazil -
How weather, population growth, and sanitation have impacted
the development of microcephaly cases linked to ZikaRead more

Visualizing Geotopics Results
In this notebook, we provide scripts to visualize the results obtained by the GeoTopics method.
Project Page: http://mmathioudakis.github.io/geotopics/
Code Repository: https://github.com/mmathioudakis/geotopics
Setup
To visualize the results, you first need to obtain an access token from MapBox and set it as MAPBOX_ACCESS_TOKEN environment variable -- or edit this notebook and pass it directly (see below). For instructions on how to obtain an access token, see the official Mapbox instructions.
In what follows, we first import the required libraries and define constants and various functions.Read more

Price to Trailing 12-Month Cashflows
Reference
This is calculated as a simple ratio between price per share and TTM free cashflow (here using the built-in Morningstar valuation ratio as an approximaton).
This ratio serves a similar function to the previous two. A future notebook will explore the subtle differences in these metrics, but they largely serve the same purpose. Once again, low values are attractive and high values are unattractive, so the metric must be inverted.Read more

WordCloud of tweets hashtags on blockchain topicRead more

Sleep Study
We evaluate the performance of MERF on a famous sleep study dataset with 180 samples and 18 clusters (with 10 samples each).Read more

Sequence to Sequence (seq2seq) Recurrent Neural Network (RNN) for Time Series Prediction
The goal of this project of mine is to bring users to try and experiment with the seq2seq neural network architecture. This is done by solving different simple toy problems about signal prediction. Normally, seq2seq architectures may be used for other more sophisticated purposes than for signal prediction, let's say, language modeling, but this project is an interesting tutorial in order to then get to more complicated stuff.
In this project are given 5 exercises of gradually increasing difficulty. I take for granted that the public already have at least knowledge of basic RNNs and how can they be shaped into an encoder and a decoder of the most simple form (without attention). To learn more about RNNs in TensorFlow, you may want to visit this other project of mine about that: https://github.com/guillaume-chevalier/LSTM-Human-Activity-Recognition
The current project is a series of example I have first built in French, but I haven't got the time to generate all the charts anew with proper English text. I have built this project for the practical part of the third hour of a "master class" conference that I gave at the WAQ (Web At Quebec) in March 2017:
https://webaquebec.org/classes-de-maitre/deep-learning-avec-tensorflow
You can find the French, original, version of this project in the French Git branch: https://github.com/guillaume-chevalier/seq2seq-signal-prediction/tree/francais
How to use this ".ipynb" Python notebook ?
Except the fact I made available an ".py" Python version of this tutorial within the repository, it is more convenient to run the code inside the notebook. The ".py" code exported feels a bit raw as an exportation.
To run the notebook, you must have installed Jupyter Notebook or iPython Notebook. To open the notebook, you must write jupyter notebook or iPython notebook in command line (from the folder containing the notebook once downloaded, or a parent folder). It is then that the notebook application (IDE) will open in your browser as a local server and it will be possible to open the .ipynb notebook file and to run code cells with CTRL+ENTER and SHIFT+ENTER, it is also possible to restart the kernel and run all cells at once with the menus. Note that this is interesting since it is possible to make that IDE run as hosted on a cloud server with a lot of GPU power while you code through the browser.
Exercises
Note that the dataset changes in function of the exercice. Most of the time, you will have to edit the neural networks' training parameter to succeed in doing the exercise, but at a certain point, changes in the architecture itself will be asked and required. The datasets used for this exercises are found in datasets.py.
Exercise 1
In theory, it is possible to create a perfect prediction of the signal for this exercise. The neural network's parameters has been set to acceptable values for a first training, so you may pass this exercise by running the code without even a change. Your first training might get predictions like that (in yellow), but it is possible to do a lot better with proper parameters adjustments:
Note: the neural network sees only what is to the left of the chart and is trained to predict what is at the right (predictions in yellow).
We have 2 time series at once to predict, which are tied together. That means our neural network processes multidimensional data. A simple example would be to receive as an argument the past values of multiple stock market symbols in order to predict the future values of all those symbols with the neural network, which values are evolving together in time. That is what we will do in the exercise 6.
Exercise 2
Here, rather than 2 signals in parallel to predict, we have only one, for simplicity. HOWEVER, this signal is a superposition of two sine waves of varying wavelenght and offset (and restricted to a particular min and max limit of wavelengts).
In order to finish this exercise properly, you will need to edit the neural network's hyperparameters. As an example, here is what is possible to achieve as a predction with those better (but still unperfect) training hyperparameters:
nb_iters = 2500
batch_size = 50
hidden_dim = 35
Here are predictions achieved with a bigger neural networks with 3 stacked recurrent cells and a width of 500 hidden units for each of those cells:
Note that it would be possible to obtain better results with a smaller neural network, provided better training hyperparameters and a longer training, adding dropout, and on.
Exercise 3
This exercise is similar to the previous one, except that the input data given to the encoder is noisy. The expected output is not noisy. This makes the task a bit harder. Here is a good example of what a training example (and a prediction) could now looks like :
Therefore the neural network is brought to denoise the signal to interpret its future smooth values. Here are some example of better predictions on this version of the dataset :
Similarly as I said for the exercise 2, it would be possible here too to obtain better results. Note that it would also have been possible to ask you to predict to reconstruct the denoised signal from the noisy input (and not predict the future values of it). This would have been called a "denoising autoencoder", this type of architecture is also useful for data compression, such as manipulating images.
Exercise 4
The 4th exercise is about editing the neural architecture to make it look like that:
That is, introducing feedback in the decoder, where outputs are fed anew to the next time step to be decoded. This could be compared to hearing oneself's voice upon speaking. Haven't you ever felt how speaking in a microphone is disbalancing at first ? It is because of an offset in the time of such a recurrence.
Right now, our encoder and decoder use the same cell, but with two separate, different sets of "shared" weights. This is done by the call to tf.nn.seq2seq.basic_rnn_seq2seq, however, to achieve what we want, we shall change our code to not use that function.
A simple way to do the edits would be to call the recurrent cells on the new time steps (indexes) of the encoder and decoder lists with two different cells with different names. The __call__ function of the cells (that is, the parenthesis operator) could be used. You might find more details here:
The section "Base interface for all RNN Cells" : https://www.tensorflow.org/api_guides/python/contrib.rnn
"tf.nn.seq2seq.basic_rnn_seq2seq", line 148 (in date of April 2017): https://github.com/petewarden/tensorflow_ios/blob/master/tensorflow/python/ops/seq2seq.py#L148
The comment "This builds an unrolled LSTM for tutorial purposes only.", line 143 (in date of April 2017): https://github.com/tensorflow/models/blob/master/tutorials/rnn/ptb/ptb_word_lm.py#L143
Although doing that replacement seems only formative, it is this way that TensorFlow users can keep up with building more complicated neural architectures, such as plugging an attention RNN decoder on top of a CNN to convert an image to a textual description of it, for example. To learn more about attention mechanisms in RNNs, you might want to watch this talk to tackle the theory.
Exercise 5
This exercise is much harder than the previous ones and is built more as a suggestion. It is to predict the future value of the Bitcoin's price. We have here some daily market data of the bitcoin's value, that is, BTC/USD and BTC/EUR. This is not enough to build a good predictor, at least having data precise at the minute level, or second level, would be more interesting. Here is a prediction made on the actual future values, the neural network has not been trained on the future values shown here and this is a legitimate prediction, given a well-enough model trained on the task:
Disclaimer: this prediction of the future values was really good and you should not expect predictions to be always that good using as few data as actually (side note: the other prediction charts in this project are all "average" except this one). Your task for this exercise is to plug the model on more valuable financial data in order to make more accurate predictions. Let me remind you that I provided the code for the datasets in "datasets.py", but that should be replaced for predicting accurately the Bitcoin.
It would be possible to improve the input dimensions of your model that accepts (BTC/USD and BTC/EUR). As an example, you could create additionnal input dimensions/streams which could contain meteo data and more financial data, such as the S&P 500, the Dow Jones, and on. Other more creative input data could be sine waves (or other-type-shaped waves such as saw waves or triangles or two signals for cos and sin) representing the fluctuation of minutes, hours, days, weeks, months, years, moon cycles, and on. This could be combined with a Twitter sentiment analysis about the word "Bitcoin" in tweets in order to have another input signal which is more human-based and abstract. Actually, some libraries exists to convert text to a sentiment value, and there would also be the neural network end-to-end approach (but that would be a way more complicated setup). It is also interesting to know where is the bitcoin most used: http://images.google.com/search?tbm=isch&q=bitcoin+heatmap+world
With all the above-mentionned examples, it would be possible to have all of this as input features, at every time steps: (BTC/USD, BTC/EUR, Dow_Jones, SP_500, hours, days, weeks, months, years, moons, meteo_USA, meteo_EUROPE, Twitter_sentiment). Finally, there could be those two output features, or more: (BTC/USD, BTC/EUR).
This prediction concept can apply to many things, such as meteo prediction and other types of shot-term and mid-term statistical predictions.
To change which exercise you are doing, change the value of the following "exercise" variable:Read more

Fully-Connected Neural Nets
In the previous homework you implemented a fully-connected two-layer neural network on CIFAR-10. The implementation was simple but not very modular since the loss and gradient were computed in a single monolithic function. This is manageable for a simple two-layer network, but would become impractical as we move to bigger models. Ideally we want to build networks using a more modular design so that we can implement different layer types in isolation and then snap them together into models with different architectures.
In this exercise we will implement fully-connected networks using a more modular approach. For each layer we will implement a forward and a backward function. The forward function will receive inputs, weights, and other parameters and will return both an output and a cache object storing data needed for the backward pass, like this:
def layer_forward(x, w):
""" Receive inputs x and weights w """
# Do some computations ...
z = # ... some intermediate value
# Do some more computations ...
out = # the output
cache = (x, w, z, out) # Values we need to compute gradients
return out, cache
The backward pass will receive upstream derivatives and the cache object, and will return gradients with respect to the inputs and weights, like this:
def layer_backward(dout, cache):
"""
Receive derivative of loss with respect to outputs and cache,
and compute derivative with respect to inputs.
"""
# Unpack cache values
x, w, z, out = cache
# Use values in cache to compute derivatives
dx = # Derivative of loss with respect to x
dw = # Derivative of loss with respect to w
return dx, dw
After implementing a bunch of layers this way, we will be able to easily combine them to build classifiers with different architectures.
In addition to implementing fully-connected networks of arbitrary depth, we will also explore different update rules for optimization, and introduce Dropout as a regularizer and Batch Normalization as a tool to more efficiently optimize deep networks.Read more

Animation
based on:
http://matplotlib.org/users/whats_new.html#display-hook-for-animations-in-the-ipython-notebook
http://louistiao.me/posts/notebooks/embedding-matplotlib-animations-in-jupyter-notebooks/Read more

Time Series Regression using Air Passengers DataRead more

Things to do with training set
Identify missing values related columns and further analyse. Check the relationships with end variable
Transfer objects to categorical/variables variables via one hot encodingRead more

This notebook has all the code required to emulate the results from a simulation of tumour growth shown in -
Williams, M. J., Werner, B., Barnes, C. P., Graham, T. A., & Sottoriva, A. (2016). Identification of neutral tumor evolution across cancer types. Nature Genetics. http://doi.org/10.1038/ng.3489
Notebook is laid out as follows, the first cell includes code to include the required packages, and the module NeutralEvolution that defines all types and functions used in the notebook. The following sections each have headings and short descriptions of the results generated and produces equivalent figures to those seen in the publication. In the publication, outputs from the simulations were saved and figures generated in R(v3), in this notebook we used the Gadfly package to produce inline plots.Read more

Confidence Intervals
We have developed a method for estimating a parameter by using random sampling and the bootstrap. Our method produces an interval of estimates, to account for chance variability in the random sample. By providing an interval of estimates instead of just one estimate, we give ourselves some wiggle room.
In the previous example we saw that our process of estimation produced a good interval about 95% of the time, a "good" interval being one that contains the parameter. We say that we are 95% confident that the process results in a good interval. Our interval of estimates is called a 95% confidence interval for the parameter, and 95% is called the confidence level of the interval.
The situation in the previous example was a bit unusual. Because we happened to know value of the parameter, we were able to check whether an interval was good or a dud, and this in turn helped us to see that our process of estimation captured the parameter about 95 out of every 100 times we used it.
But usually, data scientists don't know the value of the parameter. That is the reason they want to estimate it in the first place. In such situations, they provide an interval of estimates for the unknown parameter by using methods like the one we have developed. Because of statistical theory and demonstrations like the one we have seen, data scientists can be confident that their process of generating the interval results in a good interval a known percent of the time.Read more

光学シュミレーション 波面シュミレーション
このモジュールを使うと、光学シュミレーションができます。
すばらしい！
参考ページ:OpticspyRead more

Variational Inference: Bayesian Neural Networks
(c) 2016 by Thomas Wiecki
Original blog post: http://twiecki.github.io/blog/2016/06/01/bayesian-deep-learning/Read more

Table of Contents
1 Load libraries2 Load previously trained model3 Generate predictions4 Find bounding boxes5 Cut the images based on bounding box6 Test set predictionsRead more

Model Data Services (MDS)
This notebook is an early demonstration showing some of the benefit from having model data exposed via MDS (OPeNDAP).
Here, we explore model output without ever needing a copy of the data to be stored on the user's machine - instead, operations are generally lazy, and thus can minimise the amount of data requested from the upstream data provider.Read more

pomegranate / hmmlearn comparison
hmmlearn is a Python module for hidden markov models with a scikit-learn like API. It was originally present in scikit-learn until its removal due to structural learning not meshing well with the API of many other classical machine learning algorithms. Here is a table highlighting some of the similarities and differences between the two packages.
Feature
pomegranate
hmmlearn
Graph Structure
Silent States
✓
Optional Explicit End State
✓
Sparse Implementation
✓
Arbitrary Emissions Allowed on States
✓
Discrete/Gaussian/GMM Emissions
✓
✓
Large Library of Other Emissions
✓
Build Model from Matrices
✓
✓
Build Model Node-by-Node
✓
Serialize to JSON
✓
Serialize using Pickle/Joblib
✓
Algorithms
Priors
✓
Sampling
✓
✓
Log Probability Scoring
✓
✓
Forward-Backward Emissions
✓
✓
Forward-Backward Transitions
✓
Viterbi Decoding
✓
✓
MAP Decoding
✓
✓
Baum-Welch Training
✓
✓
Viterbi Training
✓
Labeled Training
✓
Tied Emissions
✓
Tied Transitions
✓
Emission Inertia
✓
Transition Inertia
✓
Emission Freezing
✓
✓
Transition Freezing
✓
✓
Multi-threaded Training
✓
Coming Soon
Just because the two features are implemented doesn't speak to how fast they are. Below we investigate how fast the two packages are in different settings the two have implemented.
Fully Connected Graphs with Multivariate Gaussian Emissions
Lets look at the sample scoring method, viterbi, and Baum-Welch training for fully connected graphs with multivariate Gaussian emisisons. A fully connected graph is one where all states have connections to all other states. This is a case which pomegranate is expected to do poorly due to its sparse implementation, and hmmlearn should shine due to its vectorized implementations.Read more

When will Arctic be ice free during summer?
One can look at the sea ice extent (first part of this notebook) or the sea ice volume (second part of the notebook).
The least amount of ice during the year happens in September, so we plot the sea ice extent (in square km) between 1979 - 2014 in September
Original data with detailed description is available at: http://nsidc.org/data/docs/noaa/g02135_seaice_index/
We are following the section 3., subsection "Monthly Sea Ice Extent Anomaly Graphs".Read more

A bit of searching for all the digits...Read more

Data Carpentry Reproducible Research Workshop - Data ExplorationRead more

Data Analysis and Visualization
For today's workshop we will be using the pandas library, the matplotlib library, and the seaborn library. Also, we will read data from the web with the pandas-datareader. By the end of the workshop, participants should be able to use Python to tell a story about a dataset they build from an open data source.
GOALS:
Understand basic functionality of Pandas DataFrame
Use Matplotlib to visualize data
Use Seaborn to explore data
Import data from web with pandas-datareader and compare development indicators from the World BankRead more

Jupyter Octave Kernel
Interact with Octave in Notebook. All commands are interpreted by Octave. Since this is a MetaKernel, a standard set of magics are available. Help on commands is available using the %help magic or using ? with a command.Read more

How long a series to maximize expected winnings
Riddler Classic 2017-07-14
https://fivethirtyeight.com/features/can-you-eat-more-pizza-than-your-siblings/
Congratulations! The Acme Axegrinders, which you own, are the regular season champions of the National Squishyball League (NSL). Your team will now play a championship series against the Boondocks Barbarians, which had the second-best regular season record. You feel good about Acme’s chances in the series because Acme won exactly 60 percent of the hundreds of games it played against Boondocks this season. (The NSL has an incredibly long regular season.) The NSL has two special rules for the playoffs:
The owner of the top-seeded team (i.e., you) gets to select the length of the championship series in advance of the first game, so you could decide to play a single game, a best two out of three series, a three out of five series, etc., all the way up to a 50 out of 99 series.
The owner of the winning team gets $1 million minus $10,000 for each of the victories required to win the series, regardless of how many games the series lasts in total. Thus, if the top-seeded team’s owner selects a single-game championship, the winning owner will collect $990,000. If he or she selects a 4 out of 7 series, the winning team’s owner will collect $960,000. The owner of the losing team gets nothing.
Since Acme has a 60 percent chance of winning any individual game against Boondocks, Rule 1 encourages you to opt for a very long series to improve Acme’s chances of winning the series. But Rule 2 means that a long series will mean less winnings for you if Acme does take the series.
How long a series should you select in order to maximize your expected winnings? And how much money do you expect to win?Read more

Exploring DrivePro GPS format
The Transcend DrivePro 220 exports its videos in Quicktime MOV format, in a way that also includes GPS information every second in the video. This information can be viewed using their Windows and/or Mac apps, but not exported. This notebook will attempt to get to the bottom of how the GPS data is stored, so I can use this dashcam to provide data to the OpenStreetView project.Read more

Distributed DataFrames with DaskRead more

Epilepsy Comorbidity Analysis using SCAIView
This notebook contains the Quantification of gene overlap comparing Epilepsy with other disorders using text mining
Authors: Daniel Domingo-Fernández and Charles Tapley Hoyt
Following, the set of queries used in this analysis
Reference queries:
[MeSH Disease:"Epilepsy"]
[MeSH Disease:"Alzheimer Disease"]
[MeSH Disease:"Tuberculosis"]
[MeSH Disease:"Parkinson Disease"]
[MeSH Disease:"Dementia"]
[MeSH Disease:"Migraine Disorders"]
[MeSH Disease:"Diabetes Mellitus"]
[MeSH Disease:"Colonic Neoplasms"]
[MeSH Disease:"Pulmonary Disease Chronic Obstructive"]
[MeSH Disease:"Peptic Ulcer"]
[MeSH Disease:"Anxiety Disorders"]
[MeSH Disease:"Urinary Incontinence"]
[MeSH Disease:"Cataract"]
[MeSH Disease:"Hypertension"]
[MeSH Disease:"Arthritis"]
Queries used for calculating pleitropy rates
[MeSH Disease:"Epilepsy"] AND [MeSH Disease:"Alzheimer Disease"]
[MeSH Disease:"Epilepsy"] AND [MeSH Disease:"Parkinson Disease"]
[MeSH Disease:"Epilepsy"] AND [MeSH Disease:"Dementia"]
[MeSH Disease:"Epilepsy"] AND [MeSH Disease:"Migraine Disorders"]
[MeSH Disease:"Epilepsy"] AND [MeSH Disease:"Diabetes Mellitus"]
[MeSH Disease:"Epilepsy"] AND [MeSH Disease:"Colonic Neoplasms"]
[MeSH Disease:"Epilepsy"] AND [MeSH Disease:"Pulmonary Disease Chronic Obstructive"]
[MeSH Disease:"Epilepsy"] AND [MeSH Disease:"Anxiety Disorders"]
[MeSH Disease:"Epilepsy"] AND [MeSH Disease:"Urinary Incontinence"]
[MeSH Disease:"Epilepsy"] AND [MeSH Disease:"Cataract"]
[MeSH Disease:"Epilepsy"] AND [MeSH Disease:"Hypertension"]
[MeSH Disease:"Epilepsy"] AND [MeSH Disease:"Arthritis"]
The queries were retrieved using SCAIView version 1.7.3
Corresponding to the indexing of MEDLINE on 2016-07-14T13:50:07.797575Z.
*Note that the reference queries might take time since thousand of articles need to be analyzed.Read more

*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*
The text is released under the CC-BY-NC-ND license, and code is released under the MIT license. If you find this content useful, please consider supporting the work by buying the book!Read more

Implementation of Salisman's Don't Overfit submissionRead more

I haven't posted much content over the past year as I've been quite preoccupied with other activities. It's time to change.
The number and quality of data visualization tools available on the web has increased markedly over the past few years. It is time to bring the presentation of my work into the new era. To this end I'm retroactively going to update my previous blog posts to be more easily read and used.
Starting from today each of my blog posts will be a standalone Jupyter notebook. This hopefully will allow people to work through the same process I did in achieving a given result. Additionally, my posts which contained static figures output from matplotlib will now be rendered dynamically using tools such as Plotly. What does this mean in practice? Let's quickly take a look.Read more

General Assembly Breast Cancer Project
By Brendan BaileyRead more

Content under Creative Commons Attribution license CC-BY 4.0, code under MIT license (c)2014 L.A. Barba, G.F. Forsyth, C. Cooper. Based on CFDPython, (c)2013 L.A. Barba, also under CC-BY license.Read more

Self study from:
https://www.kaggle.com/arthurtok/introduction-to-ensembling-stacking-in-pythonRead more

Build a language detector model
The goal of this exercise is to train a linear classifier on text features
that represent sequences of up to 3 consecutive characters so as to be
recognize natural languages by using the frequencies of short character
sequences as 'fingerprints'.
Author: Olivier Grisel [email protected]rg
License: Simplified BSDRead more

Tak exploration
We want to build an AI for Tak, a simple game that was inspired by the Kingkiller Chronicles by Patrick Rothfuss. I'm not an AI expert (I'm not even a novice), so this will be an exploration, and attempt to try and develop a game AI, building from a small background.Read more

Generate initial state
We use the code created by the ion trapping group to compute the equilibrium positions of ions in the Penning trap. First we import the mode_analysis_code module:Read more

Generate initial state
We use the code created by the ion trapping group to compute the equilibrium positions of ions in the Penning trap. First we import the mode_analysis_code module:Read more

Decision tree (categorical)
Let's try to construct a decision tree for the weather data. If we stick to scikit, unfortunately, we already hit a big limitation of its decision tree implementation:
scikit decision trees are only for numeric data!!
Our weather data, however, is categorical, so that we now need to do the attribute encoding that was discussed earlier.
We cannot simply replace strings as values (i.e., "Sunny" = 1, "Rainy" = 2, etc.), since scikit actually treats these values as numbers, but our data has no ordering. If our data would be ordinal, we could do this, since that would make sense (i.e., "Worst" = -2, "Neutral" = 0, "Best" = 2, for example).
So, we have to encoding our values in a one-hot encoding. For this, we can use a variety of approaches. Let's do pandas for now.Read more

Hello Networkx and Planarity
NetworkX Homepage
Planarity's Github Page
Pre-generated Graph DB - On Google DriveRead more

Face Generation
In this project, you'll use generative adversarial networks to generate new images of faces.
Get the Data
You'll be using two datasets in this project:
MNIST
CelebA
Since the celebA dataset is complex and you're doing GANs in a project for the first time, we want you to test your neural network on MNIST before CelebA. Running the GANs on MNIST will allow you to see how well your model trains sooner.
If you're using FloydHub, set data_dir to "/input" and use the FloydHub data ID "R5KrjnANiKVhLWAkpXhNBe".Read more

ISTRUZIONE E MERCATO DEL LAVORO
Abbiamo preso in considerazione tre variabili: tasso occupazione 20-64, il tasso dei laureati 30-34 e il tasso degli occupati sovraistruiti tutte relative all anno 2015. L'obiettivo è valutare, in termini di incontro di domanda e offerta di lavoro, se avere una laurea garantisca un'occupazione commisurata allo stesso titolo di studio.Read more

In this lab we'll look at images as data. In particular, we'll show how to use standard Python libraries to visualize and distort images, switch between matrix and vector representations, and run machine learning to perform a classification task. We'll be using the ever popular MNIST digit recognition data set. This will be a two part lab, comprised of the following steps:
Part1:
Generate random matrices and visualize as images
Data Prep and Exploration
-- Load MNIST data
-- Convert the vectors to a matrix and visualize
-- Look at pixel distributions
-- Heuristic feature reduction
-- Split data into train and validation
Part 2:
3. Classification
-- One vs. All using logisitic regression.
-- Random Forests on all
4. Error analysis
-- Visualize confusion matrix
-- Look at specific errors
5. Generate synthetic data and rerun RFRead more

Welcome to an example BinderRead more