_allthingsdata_
@allthingsdata_
😁📈📊#dataanalytics #datarocks
You might like
Python: Extract day from week number (YYYY-W01) format column monthly["new_col"] = monthly['weekn_column'].apply(lambda x: datetime.datetime.strptime(x + '-1', "%Y-W%W-%w")) Creates a new datetime column with YYYY-MM-DD.
SQL lpad() function: LPAD('name', 7, 'sur'); Result: 'surname' Pads the left-side of a string with a specific set of characters
Python loop with zip(): def lzip(df): t = 0 for a, b in zip(df['A'], df['B']): t += a + b return t Takes several iterables making an iterator that aggragates elements from each iterable.
Python cut() function: pd.cut(df[‘col_name’], 6) Bins values into discrete intervals.
Python use of idxmin() and idxmax(): df.loc[df[‘Salary’].idxmax()][‘Name’]
SQL running totals using OVER clause: SELECT ID, Date, Amount, SUM(Amount) OVER(PARTITION BY Date ORDER BY ID) as RunningTotal FROM Sales ORDER BY ID, Amount
Python counting occurrences: n = [1,3,2,7,5,2,3,3] get_threes = list(filter(lambda n: n == 3, n)) count_threes = len(threes)
Python sorting: my_array.sort() - sorts and mutates the array in place np.sort(my_array) - returns a copy of the sorted array without mutating the original array
Python most frequent: freq = [3, 5, 2, 2, 1, 2, 3, 2] print(max(set(freq), key = freq.count)) # 2 Finds the most frequently occurring value in a list or string.
SQL data loading in chunks: SELECT col_names FROM table ORDER BY col_1 OFFSET @PageSize * @PageNumber ROWS FETCH NEXT @PageSize ROWS ONLY; A paging query where @PageSize is the number of rows you want in each chunk and @PageNumber is the iteration number.
Python formatting: To simultaneously perform several independent replacements use - replace(). Eg. replace a comma and dollar sign with ' ' string, then convert column values to floats: df['col_name'] = df.col_name.replace('[$,]', ' ', regex = True).astype('float')
Python functions: factorize() df['newcol_name'] = df.col_name.factorize()[0] Encodes each value from an existing column as an integer.
Array similarity: allclose() Returns True if items in two arrays are similar within a specific tolerance.
Numpy function: intersect1d() Returns the values that intersect in two arrays i.e. the items that are common in both arrays.
Numpy functions: argmin() argmax() Return the indices of the minimum resp. maximum values.
Python: Get the index of the maximal (or minimal) element in a list or tuple: x = [3, 1, 12, 3, 8] max((item, i) for i, item in enumerate(x))[1] 2
SQL sampling -- selects 20 percent of the data: SELECT * FROM table WHERE random() > 0.8
United States Trends
- 1. CarPlay 3,143 posts
- 2. Osimhen 58.3K posts
- 3. Cynthia 98.5K posts
- 4. Megyn Kelly 15.7K posts
- 5. Katie Couric 6,781 posts
- 6. Senator Fetterman 7,983 posts
- 7. Black Mirror 4,027 posts
- 8. Padres 27.2K posts
- 9. #WorldKindnessDay 14.8K posts
- 10. Gabon 105K posts
- 11. Vine 15.2K posts
- 12. #NGAGAB 13.5K posts
- 13. Massie 96.6K posts
- 14. Woody Johnson N/A
- 15. #LoveDesignEP7 193K posts
- 16. Bonhoeffer 3,544 posts
- 17. Sheel N/A
- 18. RIN AOKBAB BEGIN AGAIN 192K posts
- 19. ariana 82.8K posts
- 20. #DirtyDonald 4,184 posts
Something went wrong.
Something went wrong.