Cheat Sheet Sklearn

Scikit Learn Cheat Sheet Pdf
Sklearn Cheat Sheet Pdf
Cheat Sheet Machine Learning
Datacamp Python Cheat Sheet

I hope this short tutorial and cheat sheet is helpful for your scikit-learn journey. These methods will make your data scientist journey much smoother and simpler as you continue to learn these powerful tools. There is still a lot to learn about Scikit-learn and the other Python ML libraries. Scikit Learn Cheat Sheet Scikit-learn is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines is a simple and efficient tools for data mining and data analysis. Scikit-Learn; The goal of this series is to provide introductions, highlights, and demonstrations of how to use the must-have libraries so you can pick what to explore more in depth. Scikit-Learn (website here) is built on top of NumPy, SciPy, and matplotlib. It contains an extensive collection of ready-to-use Machine Learning. Jupyter Notebook Cheat Sheet September 19th, 2017 This Jupyter Notebook cheat sheet will help you to find your way around the well-known Jupyter Notebook App, a subproject of Project Jupyter.

Have a little time to learn Tensorflow 2.0 with your Machine Learning? In this article, I have put together the 10 best Tensorflow cheat sheets for you to hang on the wall above your desk. Whenever you need a reference, keep these handy cheat sheets available!!

Cheat Sheet 1: BecomingHuman.AI

becominghuman.ai has multiple cheat sheets but this one I have found to be one of the best. Easy to read and understand, this cheat sheet is great for beginners and advanced Tensorflow learners alike!

Pros: Rated ‘E’ for everyone.

Cons: Color can be distracting

Cheat Sheet 2: Altoros

Altoros is another great website to find cheat sheets for machine learning on!! They cover a wide range of subjects. This Tensorflow cheat sheet is 3 pages long but it is totality worth the wealth of information you can receive.

Pros: Easy to read and understand.

Cons: Condensed information can be difficult for some readers.

Cheat Sheet 3: Tech Republic

This cheat sheet from Tech Republic is chock full of information for you! Including an introduction, how to begin, top competitors and additional resources. This is great if you are looking to become an IT Pro

Pros: Great for those who are looking to upgrade their skills.

Cons: It is a lot of reading material (8 pages, condensed)

Cheat Sheet 4: Dummies

Sometimes, the best way to learn is from a dummy!! Tensorflow for dummies is a great way to get an introduction to Tensorflow, what it is and how it works. Perfect for beginners in machine learning!

Pros: Great information for beginners

Cons: Condensed information, no working examples

Cheat Sheet 5: TensorFlow

To learn Tensorflow, one must go to Tensorflow.org! This website and cheat sheet will teach you everything you need to learn Tensorflow correctly and effectively. There is a lot of materials here so be prepared for a lengthy read! Great for beginners and Advanced Tensorflowers!

Pros: Rated ‘E’ for everyone, the best way to Tensorflow.

Cons: Can be confusing to the absolute beginner.

Cheat Sheet 6: Github

This is a really great cheat sheet written by Patrick on Github! He shows examples and syntax. This is a great sheet for all Tensorflow learners!!

Pros: Rated ‘E’ for everyone.

Cons: None that I can see.

Cheat sheet 7: Stanford.edu

This cheat sheet shows you the ins and outs of Tensorflow what it is, how it works and how it compares to other data science tools compare. Easily readable for beginners and advanced Tensorflow users alike.

Pros: Easy to understand

Cons: None that I can see

Cheat Sheet 8: Tensorflow Core

This cheat sheet is from Tensorflow Core. It shows api documentation for Tensorflow in Python!! Alongside other languages, it shows the correct explanation and syntax for the method you are trying to perform.

Pros: Easy to read, rated ‘E’ for everyone.

Cons: none that I can see.

Cheat Sheet 9: HackerNoon

To understand Tensorflow, you must understand Deep Learning. This cheat sheet is one to keep handy as a dog-eared reference in the desk drawer or right next to your working laptop.

Camron will take you from beginning to end understanding Tensorflow and deep learning easier with explanations plus the best resources. This is a great resource for those who are serious about looking into Data Science as a career with Python and Tensorflow.

Pros: Tons of resources, rated ‘E’ for everyone.

Cons: A lot of research materials and reading.

Cheat Sheet 10: Cheatography

This cheat sheet will show you the types of models from machine learning you can build with Tensorflow!! It has graphics, explanations and examples on what you need to know for Tensorflow, Machine and Deep Learning!

Pros: Rated ‘E’ for everyone

Cons: None that I can see.

Thank you for joining me on another journey to find the top 10 best cheat sheets on Tensorflow! I hope that they are useful to you on your journey in Deep Learning and Tensorflow!

Related Articles:

By Andre Ye, Cofounder at Critiq, Editor & Top Writer at Medium.

Source: Pixabay.

There are several areas of data mining and machine learning that will be covered in this cheat-sheet:

Predictive Modelling. Regression and classification algorithms for supervised learning (prediction), metrics for evaluating model performance.
Methods to group data without a label into clusters: K-Means, selecting cluster numbers based objective metrics.
Dimensionality Reduction. Methods to reduce the dimensionality of data and attributes of those methods: PCA and LDA.
Feature Importance. Methods to find the most important feature in a dataset: permutation importance, SHAP values, Partial Dependence Plots.
Data Transformation. Methods to transform the data for greater predictive power, for easier analysis, or to uncover hidden relationships and patterns: standardization, normalization, box-cox transformations.

All images were created by the author unless explicitly stated otherwise.

Predictive Modelling

Train-test-split is an important part of testing how well a model performs by training it on designated training data and testing it on designated testing data. This way, the model’s ability to generalize to new data can be measured. In sklearn, both lists, pandas DataFrames, or NumPy arrays are accepted in X and y parameters.

Training a standard supervised learning model takes the form of an import, the creation of an instance, and the fitting of the model.

sklearnclassifier models are listed below, with the branch highlighted in blue and the model name in orange.

sklearnregressor models are listed below, with the branch highlighted in blue and the model name in orange.

Evaluating model performance is done with train-test data in this form:

sklearnmetrics for classification and regression are listed below, with the most commonly used metric marked in green. Many of the grey metrics are more appropriate than the green-marked ones in certain contexts. Each has its own advantages and disadvantages, balancing priority comparisons, interpretability, and other factors.

Clustering

Before clustering, the data needs to be standardized (information for this can be found in the Data Transformation section). Clustering is the process of creating clusters based on point distances.

Source. Image free to share.

Training and creating a K-Means clustering model creates a model that can cluster and retrieve information about the clustered data.

Accessing the labels of each of the data points in the data can be done with:

Similarly, the label of each data point can be stored in a column of the data with:

Accessing the cluster label of new data can be done with the following command. The new_data can be in the form of an array, a list, or a DataFrame.

Accessing the cluster centers of each cluster is returned in the form of a two-dimensional array with:

To find the optimal number of clusters, use the silhouette score, which is a metric of how well a certain number of clusters fits the data. For each number of clusters within a predefined range, a K-Means clustering algorithm is trained, and its silhouette score is saved to a list (scores). data is the x that the model is trained on.

After the scores are saved to the list scores, they can be graphed out or computationally searched for to find the highest one.

Dimensionality Reduction

Dimensionality reduction is the process of expressing high-dimensional data in a reduced number of dimensions such that each one contains the most amount of information. Dimensionality reduction may be used for visualization of high-dimensional data or to speed up machine learning models by removing low-information or correlated features.

Principal Component Analysis, or PCA, is a popular method of reducing the dimensionality of data by drawing several orthogonal (perpendicular) vectors in the feature space to represent the reduced number of dimensions. The variable number represents the number of dimensions the reduced data will have. In the case of visualization, for example, it would be two dimensions.

Visual demonstration of how PCA works. Source.

Fitting the PCA Model: The .fit_transform function automatically fits the model to the data and transforms it into a reduced number of dimensions.

Scikit Learn Cheat Sheet Pdf

Explained Variance Ratio: Calling model.explained_variance_ratio_ will yield a list where each item corresponds to that dimension’s “explained variance ratio,” which essentially means the percent of the information in the original data represented by that dimension. The sum of the explained variance ratios is the total percent of information retained in the reduced dimensionality data.

PCA Feature Weights: In PCA, each newly creates feature is a linear combination of the former data’s features. Theselinear weights can be accessed with model.components_, and are a good indicator for feature importance (a higher linear weight indicates more information represented in that feature).

Linear Discriminant Analysis (LDA, not to be commonly confused with Latent Dirichlet Allocation) is another method of dimensionality reduction. The primary difference between LDA and PCA is that LDA is a supervised algorithm, meaning it takes into account both x and y. Principal Component Analysis only considers x and is hence an unsupervised algorithm.

PCA attempts to maintain the structure (variance) of the data purely based on distances between points, whereas LDA prioritizes clean separation of classes.

Feature Importance

Feature Importance is the process of finding the most important feature to a target. Through PCA, the feature that contains the most information can be found, but feature importance concerns a feature’s impact on the target. A change in an ‘important’ feature will have a large effect on the y-variable, whereas a change in an ‘unimportant’ feature will have little to no effect on the y-variable.

Permutation Importance is a method to evaluate how important a feature is. Several models are trained, each missing one column. The corresponding decrease in model accuracy as a result of the lack of data represents how important the column is to a model’s predictive power. The eli5 library is used for Permutation Importance.

In the data that this Permutation Importance model was trained on, the column lat has the largest impact on the target variable (in this case, the house price). Permutation Importance is the best feature to use when deciding which to remove (correlated or redundant features that actually confuse the model, marked by negative permutation importance values) in models for best predictive performance.

SHAP is another method of evaluating feature importance, borrowing from game theory principles in Blackjack to estimate how much value a player can contribute. Unlike permutation importance, SHapley Addative ExPlanations use a more formulaic and calculation-based method towards evaluating feature importance. SHAP requires a tree-based model (Decision Tree, Random Forest) and accommodates both regression and classification.

PD(P) Plots, or partial dependence plots, are a staple in data mining and analysis, showing how certain values of one feature influence a change in the target variable. Imports required include pdpbox for the dependence plots and matplotlib to display the plots.

Isolated PDPs: the following code displays the partial dependence plot, where feat_name is the feature within X that will be isolated and compared to the target variable. The second line of code saves the data, whereas the third constructs the canvas to display the plot.

The partial dependence plot shows the effect of certain values and changes in the number of square feet of living space on the price of a house. Shaded areas represent confidence intervals.

Contour PDPs: Partial dependence plots can also take the form of contour plots, which compare not one isolated variable but the relationship between two isolated variables. The two features that are to be compared are stored in a variable compared_features.

The relationship between the two features shows the corresponding price when only considering these two features. Partial dependence plots are chock-full of data analysis and findings, but be conscious of large confidence intervals.

Data Transformation

Standardizing or scaling is the process of ‘reshaping’ the data such that it contains the same information but has a mean of 0 and a variance of 1. By scaling the data, the mathematical nature of algorithms can usually handle data better.

The transformed_data is standardized and can be used for many distance-based algorithms such as Support Vector Machine and K-Nearest Neighbors. The results of algorithms that use standardized data need to be ‘de-standardized’ so they can be properly interpreted. .inverse_transform() can be used to perform the opposite of standard transforms.

Normalizing data puts it on a 0 to 1 scale, something that, similar to standardized data, makes the data mathematically easier to use for the model.

While normalizing doesn’t transform the shape of the data as standardizing does, it restricts the boundaries of the data. Whether to normalize or standardize data depends on the algorithm and the context.

Box-cox transformations involve raising the data to various powers to transform it. Box-cox transformations can normalize data, make it more linear, or decrease the complexity. These transformations don’t only involve raising the data to powers but also fractional powers (square rooting) and logarithms.

Sklearn Cheat Sheet Pdf

For instance, consider data points situated along the function g(x). By applying the logarithm box-cox transformation, the data can be easily modelled with linear regression.

Created with Desmos.

sklearn automatically determines the best series of box-cox transformations to apply to the data to make it better resemble a normal distribution.

Because of the nature of box-cox transformation square-rooting, box-cox transformed data must be strictly positive (normalizing the data beforehand can take care of this). For data with negative data points as well as positive ones, set method = ‘yeo-johnson’ for a similar approach to making the data more closely resemble a bell curve.

Cheat Sheet Machine Learning

Original. Reposted with permission.

Datacamp Python Cheat Sheet

Related: