DummiesMachine's profile picture. A DS/ML enthusiast

Machine Learning for Dummies

@DummiesMachine

A DS/ML enthusiast

In #Gaussian Naive Bayes, continuous values associated with each feature are assumed to be distributed according to a #Gaussian distribution. #bellshapedcurve #sameasnormaldistribution


Gradient boosting machines are generally very slow in implementation because of sequential model training.Thus, #XGBoost is focused on computational speed and model performance.


#LightGBM beats all the other algorithms when the dataset is extremely large. LightGBM is a gradient boosting framework that uses tree-based algorithms and follows #leafwise approach while other algorithms work in a #levelwise approach pattern.


Since #DecisionTrees, #Random Forests, #XGBoost takes care of the missing values itself, you do not have to impute the missing values.


#Boosting in general decreases the bias error and builds strong predictive models. #Boosting has shown better predictive accuracy than #bagging, but it also tends to over-fit the training data as well #ensemblemodelling


#Earlystopping is when you are training your dataset but do not go all the way till the point you cost function is the least so as to keep the weights mid sized.The more you train,the bigger the weights of your model become hence it simulateneously gives #regularization effect


A bit about regualrization techniques - L2 will reduce your parameter weights a bit thus diminishing effect of 1 parameter too much while L1 totally removes some parameters by adjusting its wight to 0 #machinelearning #datascience #statistics #modellingtechniques #AI #regression


SVM selects the hyper-plane which classifies the classes accurately PRIOR to maximizing margin. #machinelearning #datascience #statistics #modellingtechniques #AI #regression #classification #MLalgorithms


A median is not affected by outliers. So when you see your dataset having too many outliers, use #median instead #mean for #binning purposes.


There are various methods to calculate the similarity between two objects while building a #recommendationsystem. Distance scores like #euclidean, #cosine, #pearson #corelation can be used, there is no good or bad. decide using all three and see which gives better results


#collaborative methods are largely #memorybased and #modelbased. Memory based models are based on user-user or item-item frameworks while model based rely on methods like #matrixfactorization in order to deal with sparse datasets#machinelearning #datascience #statistics


#Recommendor systems are widely divided into #contentbased and #collaborativefiltering. For former, you are aware of different features vectors that define the product to be recommended while in latter, it is an incremental process of learning product as well as user features


Whenever you see terms like ||u|| , it basically means length of #vector u which is sq. root of sum of its x and y projection #machinelearning #datascience #statistics #modellingtechniques #AI #regression #classification #MLalgorithms


While #sigmoid activation is used for binary classification tasks, #softmax activation is used for Used for the multi-classification tasks #machinelearning #datascience #statistics #modellingtechniques #AI #regression #classification #MLalgorithms


Some info on activation functions. #sigmoid to convert outputs between 0 to 1, #tanh for -1 to 1, #relu where anything less thn zero has output of 0, #softmax again to collapse outputs between 0 to 1 - used in #multilable #classification


This account does not follow anyone
Loading...

Something went wrong.


Something went wrong.