
Visit In-Database Overview for more information about in-database support and tools. The Linear Regression tool supports Oracle, Microsoft SQL Server 2016, and Teradata in-database processing.
I (Interactive): Displays a dashboard of interactive visualizations to support further data-discovery and model exploration.Ĭonfigure the Tool for In-Database Processing. R (Report): Displays a summary report of the model that includes a summary and plots. O (Output): Displays the model name and size of the object in the Results window. Display graphs: Select to display graphs when using regularized regression.Ĭonnect a Browse tool to each output anchor to view results. Higher resolution creates a larger file with better print quality. Lower resolution creates a smaller file and is best for viewing on a monitor. Graph resolution: Select the resolution of the graph in dots per inch: 1x (96 dpi), 2x (192 dpi), or 3x (288 dpi). A higher number of folds results in more robust estimates of model quality, but fewer folds make the tool run faster. The folds are selected differently in each trial, and the overall results are averaged across all the trials. Number of trials: Select the number of times to repeat the cross-validation procedure. Some metrics and graphs will be displayed in the static R output, and others will be displayed in the interactive I output. Use cross-validation to determine estimates of model quality: Select to perform cross-validation and obtain various model quality metrics and graphs. Choosing the same seed each time the workflow is run guarantees that the same records will be in the same fold each time. Set seed: Select to ensure the reproducibility of cross-validation and select the value of the seed used to assign records to folds. Model with lower in sample standard error. What type of model: Select the type of model to determine the coefficients. Number of folds: Select the number of folds to divide the data. Use cross-validation to determine model parameters: Select to perform cross-validation and obtain various model parameters. Standardize predictor variables: Select to make all variables the same size based on the algorithm used. Enter value of alpha: Select a value between 0 (ridge regression) and 1 (lasso) to measure the amount of emphasis given to the coefficient. Use regularized regression: Select to balance the same minimization of sum of squared errors with a penalty term on the size of the coefficients and produce a simpler model. Use a weight variable for weighted least squares: Select a variable to determine the amount of importance to place on each record when creating a least-squares model. Omit a model constant: Select to omit a constant and have the best fit line pass through the origin. Select Customize to modify the Model, Cross-validation, and Plots settings. They have no predictive value and can cause runtime exceptions. Columns containing unique identifiers, such as surrogate primary keys and natural primary keys, should not be used in statistical analyses. Any number of predictor variables can be selected, but the target variable should not also be a predictor variable. A predictor variable is also known as a feature or an independent variable. Select the predictor variables: Select the data to use to influence the value of the target variable. A target variable is also known as a response or dependent variable. Select the target variable: Select the data to be predicted.
No other special characters are allowed, and R is case sensitive. Model names must start with a letter and may contain letters, numbers, and the special characters period (.) and underscore (_). Model name: Enter a name for the model to identify the model when it is referenced in other tools.
The advantage of using the RevoScaleR based function is that it allows much larger (out of memory) datasets to be analyzed, but at the cost of additional overhead to create an XDF file and the inability to create some of the model diagnostic output that is available with the open source R functions. If the input data comes from either an XDF Output tool or XDF Input tool, then the RevoScaleR rxLinMod function is used for model estimation. If the input data is from an Alteryx data stream, then the open-source R lm function and the glmnet and cv.glmnet functions (from the glmnet package) is used for model estimation.