Kyso | share
S

share

1 Upvotes

Introduction to Linear Regression - Predicting Bike Usage in MadridRead more
S
shareApr 12, 201858 views0 comments
Start to Finish Data Science with Anaconda Anaconda 4.3, early 2017Read more
S
shareMar 19, 201833 views0 comments
Basic UMAP Usage and Parameters UMAP is a fairly flexible non-linear dimension reduction algorithm. It seeks to learn the manifold structure of your data and find a low dimensional embedding that preserves the essential topological structure of that manifold. In this notebook we will generate some visualisable 4-dimensional data, demonstrate how to use UMAP to provide a 2-dimensional representation of it, and then look at how various UMAP parameters can impact the resulting embedding. This notebook is based on the work of Philippe Rivière for visionscarto.net. To start we'll need some basic libraries. First numpy will be needed for basic array manipulation. Since we will be visualising the results we will need matplotlib and seaborn. Finally we will need umap for doing the dimension reduction itself.Read more
S
shareMar 07, 201850 views0 comments
Tutorial based on the No bullshit guide series of textbooks by Ivan SavovRead more
S
shareMar 07, 201835 views0 comments
S
shareMar 07, 201831 views0 comments
Number of data-science related repos on Github by year Search API docsRead more
S
shareMar 06, 201836 views0 comments
We'll cut down in the size of the DataFrame we're working with by throwing out weapon data for weapons with indices over three (which are really only present on ships anyway).Read more
S
shareMar 06, 201835 views0 comments
Results from arduino radiation-board Revived a forgotten Libelium/Cooking-hacks radiation arduino shield. The same board + J305βγ tube had produced these results ~5 years back: http://ankostis.blogspot.it/2011/07/blog-post.html github: https://github.com/ankostis/radiationsensorRead more
S
shareMar 06, 201825 views0 comments
HistogramsRead more
S
shareMar 06, 201834 views0 comments
Functional preference profiles Note: This analysis is almost identical to this one: https://github.com/adelavega/neurosynth-lfc/blob/master/Functional preference profiles.ipynb Here, I'll take advantage of Neurosynth's semantic data to assuming function to each sub-component of the default network. For each region in the clustering analysis, we're going to determine how well we can classify studies that activated the region, versus those that did not, on the basis of latent topics describing the psychological states in each study.Read more
S
shareMar 06, 201813 views0 comments
Require AmberTools conda install ambertools=17 -c http://ambermd.org/downloads/ambertools/conda/Read more
S
shareMar 06, 201836 views0 comments
S
shareMar 06, 201830 views0 comments
S
shareMar 06, 201827 views0 comments
Code 13.1Read more
S
shareMar 06, 201851 views0 comments
Evaluating Burgers Equation with different CFD Schemes We've already examined Burgers equation in both 1D and 2D (Step 4 and Step 8, respectively). Here we are going to restrict ourselves to the 1D Burgers equation and examine the role that different schemes play in discretizing non-linear 1st order hyperbolic equations.Read more
S
shareMar 06, 201822 views0 comments
Author: Jake Vanderplas: http://jakevdp.github.io This is an example of embedding an animation via javascript into an IPython notebook. It requires the JSAnimation import, available at http://github.com/jakevdp/JSAnimation. The animation widget makes use of the HTML5 slider element, which is not yet supported by Firefox and some other browsers. For a comprehensive list of browser support for this element, see http://caniuse.com/input-range. This notebook contains a simple animation example which is displayed inline using the JSAnimation IPython_display plugin.Read more
S
shareMar 06, 201824 views0 comments
Probabilistic Algorithms: Approximate Counting, *LogLog and Bloom FiltersRead more
S
shareMar 06, 201830 views0 comments
SEGMENT GPS Analysis Notes UTM 36North EPSG:32636Read more
S
shareMar 05, 201830 views0 comments
S
shareMar 05, 201829 views0 comments
Step 1 - Model Training Now that we have a feel for the data we are dealing with, we can start designing our model. In this notebook, we will define the network architecture and train the model. We will also discuss some of the transformations on the data in response to the observations that we made in the data exploration section of the notebook. Let us start by importing some libraries.Read more
S
shareMar 05, 201836 views0 comments
To run all cells: Cell -> Run All To run a selected cell: First click on it, then hit Ctrl + EnterRead more
S
shareMar 05, 201829 views0 comments
Table of Contents 1  ——WESTERN TRADE COAST HYDROLOGIC MODEL ——2  IMPORT DEPENDENCIES3  THE STUDY AREA3.1  Leaflet slippy map3.2  Set map extents3.3  Initial Cartopy plot of the study area4  FINITE DIFFERENCE GRID4.1  Helper functions to construct a gradually-refined grid4.2  Plot grid on leaflet map (uses mplleaflet)5  NUMERICAL MODEL5.1  Create workspace directory, set path to MODFLOW executable, model name, etc.5.2  DIS package (spatial and temporal discretisation)5.2.1  STEADY STATE5.2.1.1  Temporal discretisation5.2.1.2  Initial spatial discretisation5.2.1.3  Create DIS package object5.2.2  TRANSIENT5.2.2.1  Temporal discretisation5.2.3  LAYER ELEVATIONS5.2.3.1  Set spatial reference for model grid5.2.3.2  Model top elevations (DEM)5.2.3.3  Superficial aquifer — bottom elevations5.2.3.4  Create final DIS package object5.3  BAS package (boundary conditions)5.3.1  For clarity, reorganise the order of columns in the MODFLOW geodataframe5.4  GHB package (general head boundaries)5.5  LPF package (hydraulic properties)5.5.1  Hydraulic conductivity zones5.5.2  Homogeneous HK (one zone)5.5.3  Homogeneous HK (multiple zones)5.5.4  Heterogeneous HK (pilot points + Radial Basis Functions)5.5.5  Heterogeneous HK (pilot points + KNN regressor)5.6  EVT package (evapotranspiration)5.6.1  STEADY STATE5.6.1.1  Set parameters5.6.1.2  Create EVT package object5.6.2  TRANSIENT5.6.2.1  Read SILO historical records5.6.2.2  Display complete record5.6.2.3  pre-1970 record5.6.2.4  Change units (to L/T)5.6.2.5  Create EVT package object5.7  DRN package (drains)5.8  WEL package (pumping)5.9  RCH package (recharge)5.9.1  STEADY STATE5.9.1.1  Create RCH package object5.9.2  TRANSIENT5.10  SWI2 Package (seawater intrusion)5.11  LAK package (main wetlands, MAR infiltration basins)5.12  PCG and OC packages (solver and output control)5.13  Write MODFLOW input files and run model5.14  Plot heads and DTWT using ModelMap6  CALIBRATION6.1  Calibration targets (GWL observations)6.1.1  Load pickle6.1.2  Drop duplicates6.1.3  Quality check drilling depth of calibration target bores6.1.4  Drop records with drilling depth likely deeper than superficial aquifer7  FORECASTING SIMULATIONS8  SANDBOX8.1  Compute drain length in cell9  HELPER FUNCTIONS9.1  Land Use9.1.1  reclassify_LU_MRS9.1.2  BATCH_reclassify_LU_MRS9.1.3  plot_reclassify_LU_MRS9.1.4  LUraster_to_MODFLOW9.1.5  plot_raster_to_MODFLOW9.1.6  plot_raster_to_reclass_to_MODFLOWRead more
S
shareMar 05, 201831 views0 comments
Quick PCA visualization (not used later)Read more
S
shareMar 05, 201820 views0 comments
A Jupyter notebook to read tracking files and (ultimately) create a data package representation of theseRead more
S
shareMar 02, 201829 views0 comments
WordCloud of tweets hashtags on blockchain topicRead more
S
shareMar 02, 201842 views0 comments
ImportsRead more
S
shareMar 01, 201827 views0 comments
CubedSphere Basic design: data structure and functionalitiesRead more
S
shareMar 01, 201825 views0 comments
Zika and Microcephaly in Brazil - How weather, population growth, and sanitation have impacted the development of microcephaly cases linked to ZikaRead more
S
shareMar 01, 201828 views0 comments
Visualizing Geotopics Results In this notebook, we provide scripts to visualize the results obtained by the GeoTopics method. Project Page: http://mmathioudakis.github.io/geotopics/ Code Repository: https://github.com/mmathioudakis/geotopics Setup To visualize the results, you first need to obtain an access token from MapBox and set it as MAPBOX_ACCESS_TOKEN environment variable -- or edit this notebook and pass it directly (see below). For instructions on how to obtain an access token, see the official Mapbox instructions. In what follows, we first import the required libraries and define constants and various functions.Read more
S
shareMar 01, 201840 views0 comments
Price to Trailing 12-Month Cashflows Reference This is calculated as a simple ratio between price per share and TTM free cashflow (here using the built-in Morningstar valuation ratio as an approximaton). This ratio serves a similar function to the previous two. A future notebook will explore the subtle differences in these metrics, but they largely serve the same purpose. Once again, low values are attractive and high values are unattractive, so the metric must be inverted.Read more
S
shareMar 01, 201820 views0 comments
WordCloud of tweets hashtags on blockchain topicRead more
S
shareMar 01, 201834 views0 comments
Sleep Study We evaluate the performance of MERF on a famous sleep study dataset with 180 samples and 18 clusters (with 10 samples each).Read more
S
shareMar 01, 201832 views0 comments
Sequence to Sequence (seq2seq) Recurrent Neural Network (RNN) for Time Series Prediction The goal of this project of mine is to bring users to try and experiment with the seq2seq neural network architecture. This is done by solving different simple toy problems about signal prediction. Normally, seq2seq architectures may be used for other more sophisticated purposes than for signal prediction, let's say, language modeling, but this project is an interesting tutorial in order to then get to more complicated stuff. In this project are given 5 exercises of gradually increasing difficulty. I take for granted that the public already have at least knowledge of basic RNNs and how can they be shaped into an encoder and a decoder of the most simple form (without attention). To learn more about RNNs in TensorFlow, you may want to visit this other project of mine about that: https://github.com/guillaume-chevalier/LSTM-Human-Activity-Recognition The current project is a series of example I have first built in French, but I haven't got the time to generate all the charts anew with proper English text. I have built this project for the practical part of the third hour of a "master class" conference that I gave at the WAQ (Web At Quebec) in March 2017: https://webaquebec.org/classes-de-maitre/deep-learning-avec-tensorflow You can find the French, original, version of this project in the French Git branch: https://github.com/guillaume-chevalier/seq2seq-signal-prediction/tree/francais How to use this ".ipynb" Python notebook ? Except the fact I made available an ".py" Python version of this tutorial within the repository, it is more convenient to run the code inside the notebook. The ".py" code exported feels a bit raw as an exportation. To run the notebook, you must have installed Jupyter Notebook or iPython Notebook. To open the notebook, you must write jupyter notebook or iPython notebook in command line (from the folder containing the notebook once downloaded, or a parent folder). It is then that the notebook application (IDE) will open in your browser as a local server and it will be possible to open the .ipynb notebook file and to run code cells with CTRL+ENTER and SHIFT+ENTER, it is also possible to restart the kernel and run all cells at once with the menus. Note that this is interesting since it is possible to make that IDE run as hosted on a cloud server with a lot of GPU power while you code through the browser. Exercises Note that the dataset changes in function of the exercice. Most of the time, you will have to edit the neural networks' training parameter to succeed in doing the exercise, but at a certain point, changes in the architecture itself will be asked and required. The datasets used for this exercises are found in datasets.py. Exercise 1 In theory, it is possible to create a perfect prediction of the signal for this exercise. The neural network's parameters has been set to acceptable values for a first training, so you may pass this exercise by running the code without even a change. Your first training might get predictions like that (in yellow), but it is possible to do a lot better with proper parameters adjustments: Note: the neural network sees only what is to the left of the chart and is trained to predict what is at the right (predictions in yellow). We have 2 time series at once to predict, which are tied together. That means our neural network processes multidimensional data. A simple example would be to receive as an argument the past values of multiple stock market symbols in order to predict the future values of all those symbols with the neural network, which values are evolving together in time. That is what we will do in the exercise 6. Exercise 2 Here, rather than 2 signals in parallel to predict, we have only one, for simplicity. HOWEVER, this signal is a superposition of two sine waves of varying wavelenght and offset (and restricted to a particular min and max limit of wavelengts). In order to finish this exercise properly, you will need to edit the neural network's hyperparameters. As an example, here is what is possible to achieve as a predction with those better (but still unperfect) training hyperparameters: nb_iters = 2500 batch_size = 50 hidden_dim = 35 Here are predictions achieved with a bigger neural networks with 3 stacked recurrent cells and a width of 500 hidden units for each of those cells: Note that it would be possible to obtain better results with a smaller neural network, provided better training hyperparameters and a longer training, adding dropout, and on. Exercise 3 This exercise is similar to the previous one, except that the input data given to the encoder is noisy. The expected output is not noisy. This makes the task a bit harder. Here is a good example of what a training example (and a prediction) could now looks like : Therefore the neural network is brought to denoise the signal to interpret its future smooth values. Here are some example of better predictions on this version of the dataset : Similarly as I said for the exercise 2, it would be possible here too to obtain better results. Note that it would also have been possible to ask you to predict to reconstruct the denoised signal from the noisy input (and not predict the future values of it). This would have been called a "denoising autoencoder", this type of architecture is also useful for data compression, such as manipulating images. Exercise 4 The 4th exercise is about editing the neural architecture to make it look like that: That is, introducing feedback in the decoder, where outputs are fed anew to the next time step to be decoded. This could be compared to hearing oneself's voice upon speaking. Haven't you ever felt how speaking in a microphone is disbalancing at first ? It is because of an offset in the time of such a recurrence. Right now, our encoder and decoder use the same cell, but with two separate, different sets of "shared" weights. This is done by the call to tf.nn.seq2seq.basic_rnn_seq2seq, however, to achieve what we want, we shall change our code to not use that function. A simple way to do the edits would be to call the recurrent cells on the new time steps (indexes) of the encoder and decoder lists with two different cells with different names. The __call__ function of the cells (that is, the parenthesis operator) could be used. You might find more details here: The section "Base interface for all RNN Cells" : https://www.tensorflow.org/api_guides/python/contrib.rnn "tf.nn.seq2seq.basic_rnn_seq2seq", line 148 (in date of April 2017): https://github.com/petewarden/tensorflow_ios/blob/master/tensorflow/python/ops/seq2seq.py#L148 The comment "This builds an unrolled LSTM for tutorial purposes only.", line 143 (in date of April 2017): https://github.com/tensorflow/models/blob/master/tutorials/rnn/ptb/ptb_word_lm.py#L143 Although doing that replacement seems only formative, it is this way that TensorFlow users can keep up with building more complicated neural architectures, such as plugging an attention RNN decoder on top of a CNN to convert an image to a textual description of it, for example. To learn more about attention mechanisms in RNNs, you might want to watch this talk to tackle the theory. Exercise 5 This exercise is much harder than the previous ones and is built more as a suggestion. It is to predict the future value of the Bitcoin's price. We have here some daily market data of the bitcoin's value, that is, BTC/USD and BTC/EUR. This is not enough to build a good predictor, at least having data precise at the minute level, or second level, would be more interesting. Here is a prediction made on the actual future values, the neural network has not been trained on the future values shown here and this is a legitimate prediction, given a well-enough model trained on the task: Disclaimer: this prediction of the future values was really good and you should not expect predictions to be always that good using as few data as actually (side note: the other prediction charts in this project are all "average" except this one). Your task for this exercise is to plug the model on more valuable financial data in order to make more accurate predictions. Let me remind you that I provided the code for the datasets in "datasets.py", but that should be replaced for predicting accurately the Bitcoin. It would be possible to improve the input dimensions of your model that accepts (BTC/USD and BTC/EUR). As an example, you could create additionnal input dimensions/streams which could contain meteo data and more financial data, such as the S&P 500, the Dow Jones, and on. Other more creative input data could be sine waves (or other-type-shaped waves such as saw waves or triangles or two signals for cos and sin) representing the fluctuation of minutes, hours, days, weeks, months, years, moon cycles, and on. This could be combined with a Twitter sentiment analysis about the word "Bitcoin" in tweets in order to have another input signal which is more human-based and abstract. Actually, some libraries exists to convert text to a sentiment value, and there would also be the neural network end-to-end approach (but that would be a way more complicated setup). It is also interesting to know where is the bitcoin most used: http://images.google.com/search?tbm=isch&q=bitcoin+heatmap+world With all the above-mentionned examples, it would be possible to have all of this as input features, at every time steps: (BTC/USD, BTC/EUR, Dow_Jones, SP_500, hours, days, weeks, months, years, moons, meteo_USA, meteo_EUROPE, Twitter_sentiment). Finally, there could be those two output features, or more: (BTC/USD, BTC/EUR). This prediction concept can apply to many things, such as meteo prediction and other types of shot-term and mid-term statistical predictions. To change which exercise you are doing, change the value of the following "exercise" variable:Read more
S
shareMar 01, 201843 views0 comments
S
shareMar 01, 201827 views0 comments
Our Data is Voting DataRead more
S
shareMar 01, 201845 views0 comments
S
shareMar 01, 201821 views0 comments
S
shareMar 01, 201825 views0 comments
S
shareMar 01, 201826 views0 comments
Fully-Connected Neural Nets In the previous homework you implemented a fully-connected two-layer neural network on CIFAR-10. The implementation was simple but not very modular since the loss and gradient were computed in a single monolithic function. This is manageable for a simple two-layer network, but would become impractical as we move to bigger models. Ideally we want to build networks using a more modular design so that we can implement different layer types in isolation and then snap them together into models with different architectures. In this exercise we will implement fully-connected networks using a more modular approach. For each layer we will implement a forward and a backward function. The forward function will receive inputs, weights, and other parameters and will return both an output and a cache object storing data needed for the backward pass, like this: def layer_forward(x, w): """ Receive inputs x and weights w """ # Do some computations ... z = # ... some intermediate value # Do some more computations ... out = # the output cache = (x, w, z, out) # Values we need to compute gradients return out, cache The backward pass will receive upstream derivatives and the cache object, and will return gradients with respect to the inputs and weights, like this: def layer_backward(dout, cache): """ Receive derivative of loss with respect to outputs and cache, and compute derivative with respect to inputs. """ # Unpack cache values x, w, z, out = cache # Use values in cache to compute derivatives dx = # Derivative of loss with respect to x dw = # Derivative of loss with respect to w return dx, dw After implementing a bunch of layers this way, we will be able to easily combine them to build classifiers with different architectures. In addition to implementing fully-connected networks of arbitrary depth, we will also explore different update rules for optimization, and introduce Dropout as a regularizer and Batch Normalization as a tool to more efficiently optimize deep networks.Read more
S
shareMar 01, 201845 views0 comments
Animation based on: http://matplotlib.org/users/whats_new.html#display-hook-for-animations-in-the-ipython-notebook http://louistiao.me/posts/notebooks/embedding-matplotlib-animations-in-jupyter-notebooks/Read more
S
shareMar 01, 201831 views0 comments
Hoods centroidsRead more
S
shareMar 01, 201824 views0 comments
Time Series Regression using Air Passengers DataRead more
S
shareMar 01, 201818 views0 comments
Things to do with training set Identify missing values related columns and further analyse. Check the relationships with end variable Transfer objects to categorical/variables variables via one hot encodingRead more
S
shareMar 01, 201821 views0 comments
S
shareFeb 28, 201820 views0 comments
S
shareFeb 28, 201844 views0 comments
S
shareFeb 28, 201838 views0 comments
This notebook has all the code required to emulate the results from a simulation of tumour growth shown in - Williams, M. J., Werner, B., Barnes, C. P., Graham, T. A., & Sottoriva, A. (2016). Identification of neutral tumor evolution across cancer types. Nature Genetics. http://doi.org/10.1038/ng.3489 Notebook is laid out as follows, the first cell includes code to include the required packages, and the module NeutralEvolution that defines all types and functions used in the notebook. The following sections each have headings and short descriptions of the results generated and produces equivalent figures to those seen in the publication. In the publication, outputs from the simulations were saved and figures generated in R(v3), in this notebook we used the Gadfly package to produce inline plots.Read more
S
shareFeb 28, 201816 views0 comments
S
shareFeb 28, 201843 views0 comments
S
shareFeb 28, 201826 views0 comments
Confidence Intervals We have developed a method for estimating a parameter by using random sampling and the bootstrap. Our method produces an interval of estimates, to account for chance variability in the random sample. By providing an interval of estimates instead of just one estimate, we give ourselves some wiggle room. In the previous example we saw that our process of estimation produced a good interval about 95% of the time, a "good" interval being one that contains the parameter. We say that we are 95% confident that the process results in a good interval. Our interval of estimates is called a 95% confidence interval for the parameter, and 95% is called the confidence level of the interval. The situation in the previous example was a bit unusual. Because we happened to know value of the parameter, we were able to check whether an interval was good or a dud, and this in turn helped us to see that our process of estimation captured the parameter about 95 out of every 100 times we used it. But usually, data scientists don't know the value of the parameter. That is the reason they want to estimate it in the first place. In such situations, they provide an interval of estimates for the unknown parameter by using methods like the one we have developed. Because of statistical theory and demonstrations like the one we have seen, data scientists can be confident that their process of generating the interval results in a good interval a known percent of the time.Read more
S
shareFeb 27, 201815 views0 comments
S
shareFeb 27, 201829 views0 comments
S
shareFeb 27, 201818 views0 comments
this is a descriptionRead more
S
shareFeb 27, 201816 views0 comments
S
shareFeb 27, 201839 views0 comments
光学シュミレーション 波面シュミレーション このモジュールを使うと、光学シュミレーションができます。 すばらしい! 参考ページ:OpticspyRead more
S
shareFeb 26, 2018291 views0 comments
S
shareFeb 26, 201823 views0 comments
Variational Inference: Bayesian Neural Networks (c) 2016 by Thomas Wiecki Original blog post: http://twiecki.github.io/blog/2016/06/01/bayesian-deep-learning/Read more
S
shareFeb 26, 2018149 views0 comments
Table of Contents 1  Load libraries2  Load previously trained model3  Generate predictions4  Find bounding boxes5  Cut the images based on bounding box6  Test set predictionsRead more
S
shareFeb 26, 201820 views0 comments
Introduction to ...Read more
S
shareFeb 26, 201840 views0 comments
Model Data Services (MDS) This notebook is an early demonstration showing some of the benefit from having model data exposed via MDS (OPeNDAP). Here, we explore model output without ever needing a copy of the data to be stored on the user's machine - instead, operations are generally lazy, and thus can minimise the amount of data requested from the upstream data provider.Read more
S
shareFeb 26, 201827 views0 comments
pomegranate / hmmlearn comparison hmmlearn is a Python module for hidden markov models with a scikit-learn like API. It was originally present in scikit-learn until its removal due to structural learning not meshing well with the API of many other classical machine learning algorithms. Here is a table highlighting some of the similarities and differences between the two packages. Feature pomegranate hmmlearn Graph Structure Silent States ✓ Optional Explicit End State ✓ Sparse Implementation ✓ Arbitrary Emissions Allowed on States ✓ Discrete/Gaussian/GMM Emissions ✓ ✓ Large Library of Other Emissions ✓ Build Model from Matrices ✓ ✓ Build Model Node-by-Node ✓ Serialize to JSON ✓ Serialize using Pickle/Joblib ✓ Algorithms Priors ✓ Sampling ✓ ✓ Log Probability Scoring ✓ ✓ Forward-Backward Emissions ✓ ✓ Forward-Backward Transitions ✓ Viterbi Decoding ✓ ✓ MAP Decoding ✓ ✓ Baum-Welch Training ✓ ✓ Viterbi Training ✓ Labeled Training ✓ Tied Emissions ✓ Tied Transitions ✓ Emission Inertia ✓ Transition Inertia ✓ Emission Freezing ✓ ✓ Transition Freezing ✓ ✓ Multi-threaded Training ✓ Coming Soon Just because the two features are implemented doesn't speak to how fast they are. Below we investigate how fast the two packages are in different settings the two have implemented. Fully Connected Graphs with Multivariate Gaussian Emissions Lets look at the sample scoring method, viterbi, and Baum-Welch training for fully connected graphs with multivariate Gaussian emisisons. A fully connected graph is one where all states have connections to all other states. This is a case which pomegranate is expected to do poorly due to its sparse implementation, and hmmlearn should shine due to its vectorized implementations.Read more
S
shareFeb 26, 2018597 views0 comments
When will Arctic be ice free during summer? One can look at the sea ice extent (first part of this notebook) or the sea ice volume (second part of the notebook). The least amount of ice during the year happens in September, so we plot the sea ice extent (in square km) between 1979 - 2014 in September Original data with detailed description is available at: http://nsidc.org/data/docs/noaa/g02135_seaice_index/ We are following the section 3., subsection "Monthly Sea Ice Extent Anomaly Graphs".Read more
S
shareFeb 25, 201824 views0 comments
A bit of searching for all the digits...Read more
S
shareFeb 25, 201859 views0 comments
Data Carpentry Reproducible Research Workshop - Data ExplorationRead more
S
shareFeb 25, 201818 views0 comments
The sound Of HydrogenRead more
S
shareFeb 24, 201834 views0 comments
S
shareFeb 24, 201834 views0 comments
S
shareFeb 24, 201825 views0 comments
How to break a neural networkRead more
S
shareFeb 24, 201835 views0 comments
Data Analysis and Visualization For today's workshop we will be using the pandas library, the matplotlib library, and the seaborn library. Also, we will read data from the web with the pandas-datareader. By the end of the workshop, participants should be able to use Python to tell a story about a dataset they build from an open data source. GOALS: Understand basic functionality of Pandas DataFrame Use Matplotlib to visualize data Use Seaborn to explore data Import data from web with pandas-datareader and compare development indicators from the World BankRead more
S
shareFeb 24, 201868 views0 comments
Area ChartsRead more
S
shareFeb 24, 201822 views0 comments
S
shareFeb 24, 201841 views0 comments
Jupyter Octave Kernel Interact with Octave in Notebook. All commands are interpreted by Octave. Since this is a MetaKernel, a standard set of magics are available. Help on commands is available using the %help magic or using ? with a command.Read more
S
shareFeb 24, 201851 views0 comments
How long a series to maximize expected winnings Riddler Classic 2017-07-14 https://fivethirtyeight.com/features/can-you-eat-more-pizza-than-your-siblings/ Congratulations! The Acme Axegrinders, which you own, are the regular season champions of the National Squishyball League (NSL). Your team will now play a championship series against the Boondocks Barbarians, which had the second-best regular season record. You feel good about Acme’s chances in the series because Acme won exactly 60 percent of the hundreds of games it played against Boondocks this season. (The NSL has an incredibly long regular season.) The NSL has two special rules for the playoffs: The owner of the top-seeded team (i.e., you) gets to select the length of the championship series in advance of the first game, so you could decide to play a single game, a best two out of three series, a three out of five series, etc., all the way up to a 50 out of 99 series. The owner of the winning team gets $1 million minus $10,000 for each of the victories required to win the series, regardless of how many games the series lasts in total. Thus, if the top-seeded team’s owner selects a single-game championship, the winning owner will collect $990,000. If he or she selects a 4 out of 7 series, the winning team’s owner will collect $960,000. The owner of the losing team gets nothing. Since Acme has a 60 percent chance of winning any individual game against Boondocks, Rule 1 encourages you to opt for a very long series to improve Acme’s chances of winning the series. But Rule 2 means that a long series will mean less winnings for you if Acme does take the series. How long a series should you select in order to maximize your expected winnings? And how much money do you expect to win?Read more
S
shareFeb 24, 201817 views0 comments
Exploring DrivePro GPS format The Transcend DrivePro 220 exports its videos in Quicktime MOV format, in a way that also includes GPS information every second in the video. This information can be viewed using their Windows and/or Mac apps, but not exported. This notebook will attempt to get to the bottom of how the GPS data is stored, so I can use this dashcam to provide data to the OpenStreetView project.Read more
S
shareFeb 24, 201840 views0 comments
Caffe2 setup and loading of the modelRead more
S
shareFeb 23, 201828 views0 comments
S
shareFeb 23, 201870 views0 comments
Distributed DataFrames with DaskRead more
S
shareFeb 23, 201827 views0 comments
Epilepsy Comorbidity Analysis using SCAIView This notebook contains the Quantification of gene overlap comparing Epilepsy with other disorders using text mining Authors: Daniel Domingo-Fernández and Charles Tapley Hoyt Following, the set of queries used in this analysis Reference queries: [MeSH Disease:"Epilepsy"] [MeSH Disease:"Alzheimer Disease"] [MeSH Disease:"Tuberculosis"] [MeSH Disease:"Parkinson Disease"] [MeSH Disease:"Dementia"] [MeSH Disease:"Migraine Disorders"] [MeSH Disease:"Diabetes Mellitus"] [MeSH Disease:"Colonic Neoplasms"] [MeSH Disease:"Pulmonary Disease Chronic Obstructive"] [MeSH Disease:"Peptic Ulcer"] [MeSH Disease:"Anxiety Disorders"] [MeSH Disease:"Urinary Incontinence"] [MeSH Disease:"Cataract"] [MeSH Disease:"Hypertension"] [MeSH Disease:"Arthritis"] Queries used for calculating pleitropy rates [MeSH Disease:"Epilepsy"] AND [MeSH Disease:"Alzheimer Disease"] [MeSH Disease:"Epilepsy"] AND [MeSH Disease:"Parkinson Disease"] [MeSH Disease:"Epilepsy"] AND [MeSH Disease:"Dementia"] [MeSH Disease:"Epilepsy"] AND [MeSH Disease:"Migraine Disorders"] [MeSH Disease:"Epilepsy"] AND [MeSH Disease:"Diabetes Mellitus"] [MeSH Disease:"Epilepsy"] AND [MeSH Disease:"Colonic Neoplasms"] [MeSH Disease:"Epilepsy"] AND [MeSH Disease:"Pulmonary Disease Chronic Obstructive"] [MeSH Disease:"Epilepsy"] AND [MeSH Disease:"Anxiety Disorders"] [MeSH Disease:"Epilepsy"] AND [MeSH Disease:"Urinary Incontinence"] [MeSH Disease:"Epilepsy"] AND [MeSH Disease:"Cataract"] [MeSH Disease:"Epilepsy"] AND [MeSH Disease:"Hypertension"] [MeSH Disease:"Epilepsy"] AND [MeSH Disease:"Arthritis"] The queries were retrieved using SCAIView version 1.7.3 Corresponding to the indexing of MEDLINE on 2016-07-14T13:50:07.797575Z. *Note that the reference queries might take time since thousand of articles need to be analyzed.Read more
S
shareFeb 23, 201815 views0 comments
*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).* The text is released under the CC-BY-NC-ND license, and code is released under the MIT license. If you find this content useful, please consider supporting the work by buying the book!Read more
S
shareFeb 23, 201828 views0 comments
Implementation of Salisman's Don't Overfit submissionRead more
S
shareFeb 23, 201854 views0 comments
Basic Vega-Lite ExampleRead more
S
shareFeb 23, 201833 views0 comments
I haven't posted much content over the past year as I've been quite preoccupied with other activities. It's time to change. The number and quality of data visualization tools available on the web has increased markedly over the past few years. It is time to bring the presentation of my work into the new era. To this end I'm retroactively going to update my previous blog posts to be more easily read and used. Starting from today each of my blog posts will be a standalone Jupyter notebook. This hopefully will allow people to work through the same process I did in achieving a given result. Additionally, my posts which contained static figures output from matplotlib will now be rendered dynamically using tools such as Plotly. What does this mean in practice? Let's quickly take a look.Read more
S
shareFeb 22, 201818 views1 comments
General Assembly Breast Cancer Project By Brendan BaileyRead more
S
shareFeb 22, 201829 views0 comments
Content under Creative Commons Attribution license CC-BY 4.0, code under MIT license (c)2014 L.A. Barba, G.F. Forsyth, C. Cooper. Based on CFDPython, (c)2013 L.A. Barba, also under CC-BY license.Read more
S
shareFeb 22, 201837 views0 comments
Self study from: https://www.kaggle.com/arthurtok/introduction-to-ensembling-stacking-in-pythonRead more
S
shareFeb 22, 201819 views0 comments
Build a language detector model The goal of this exercise is to train a linear classifier on text features that represent sequences of up to 3 consecutive characters so as to be recognize natural languages by using the frequencies of short character sequences as 'fingerprints'. Author: Olivier Grisel [email protected]rg License: Simplified BSDRead more
S
shareFeb 22, 201820 views0 comments
Tak exploration We want to build an AI for Tak, a simple game that was inspired by the Kingkiller Chronicles by Patrick Rothfuss. I'm not an AI expert (I'm not even a novice), so this will be an exploration, and attempt to try and develop a game AI, building from a small background.Read more
S
shareFeb 22, 201819 views0 comments
Generate initial state We use the code created by the ion trapping group to compute the equilibrium positions of ions in the Penning trap. First we import the mode_analysis_code module:Read more
S
shareFeb 22, 201820 views0 comments
Generate initial state We use the code created by the ion trapping group to compute the equilibrium positions of ions in the Penning trap. First we import the mode_analysis_code module:Read more
S
shareFeb 22, 20182 views0 comments
Decision tree (categorical) Let's try to construct a decision tree for the weather data. If we stick to scikit, unfortunately, we already hit a big limitation of its decision tree implementation: scikit decision trees are only for numeric data!! Our weather data, however, is categorical, so that we now need to do the attribute encoding that was discussed earlier. We cannot simply replace strings as values (i.e., "Sunny" = 1, "Rainy" = 2, etc.), since scikit actually treats these values as numbers, but our data has no ordering. If our data would be ordinal, we could do this, since that would make sense (i.e., "Worst" = -2, "Neutral" = 0, "Best" = 2, for example). So, we have to encoding our values in a one-hot encoding. For this, we can use a variety of approaches. Let's do pandas for now.Read more
S
shareFeb 22, 201821 views0 comments
New tryRead more
S
shareFeb 22, 201826 views0 comments
Hello Networkx and Planarity NetworkX Homepage Planarity's Github Page Pre-generated Graph DB - On Google DriveRead more
S
shareFeb 22, 201821 views0 comments
Face Generation In this project, you'll use generative adversarial networks to generate new images of faces. Get the Data You'll be using two datasets in this project: MNIST CelebA Since the celebA dataset is complex and you're doing GANs in a project for the first time, we want you to test your neural network on MNIST before CelebA. Running the GANs on MNIST will allow you to see how well your model trains sooner. If you're using FloydHub, set data_dir to "/input" and use the FloydHub data ID "R5KrjnANiKVhLWAkpXhNBe".Read more
S
shareFeb 21, 201836 views0 comments
ISTRUZIONE E MERCATO DEL LAVORO Abbiamo preso in considerazione tre variabili: tasso occupazione 20-64, il tasso dei laureati 30-34 e il tasso degli occupati sovraistruiti tutte relative all anno 2015. L'obiettivo è valutare, in termini di incontro di domanda e offerta di lavoro, se avere una laurea garantisca un'occupazione commisurata allo stesso titolo di studio.Read more
S
shareFeb 21, 201816 views0 comments
S
shareFeb 21, 201822 views0 comments
S
shareFeb 21, 201847 views0 comments
In this lab we'll look at images as data. In particular, we'll show how to use standard Python libraries to visualize and distort images, switch between matrix and vector representations, and run machine learning to perform a classification task. We'll be using the ever popular MNIST digit recognition data set. This will be a two part lab, comprised of the following steps: Part1: Generate random matrices and visualize as images Data Prep and Exploration -- Load MNIST data -- Convert the vectors to a matrix and visualize -- Look at pixel distributions -- Heuristic feature reduction -- Split data into train and validation Part 2: 3. Classification -- One vs. All using logisitic regression. -- Random Forests on all 4. Error analysis -- Visualize confusion matrix -- Look at specific errors 5. Generate synthetic data and rerun RFRead more
S
shareFeb 21, 201817 views0 comments
S
shareFeb 21, 201828 views0 comments
Welcome to an example BinderRead more
S
shareFeb 21, 201828 views0 comments
Page 1