allthingsdata_'s profile picture. 😁📈📊#dataanalytics #datarocks

_allthingsdata_

@allthingsdata_

😁📈📊#dataanalytics #datarocks

Python: Extract day from week number (YYYY-W01) format column monthly["new_col"] = monthly['weekn_column'].apply(lambda x: datetime.datetime.strptime(x + '-1', "%Y-W%W-%w")) Creates a new datetime column with YYYY-MM-DD.


SQL lpad() function: LPAD('name', 7, 'sur'); Result: 'surname' Pads the left-side of a string with a specific set of characters


Python loop with zip(): def lzip(df): t = 0 for a, b in zip(df['A'], df['B']): t += a + b return t Takes several iterables making an iterator that aggragates elements from each iterable.


Python cut() function: pd.cut(df[‘col_name’], 6) Bins values into discrete intervals.


Python use of idxmin() and idxmax(): df.loc[df[‘Salary’].idxmax()][‘Name’]


SQL running totals using OVER clause: SELECT ID, Date, Amount, SUM(Amount) OVER(PARTITION BY Date ORDER BY ID) as RunningTotal FROM Sales ORDER BY ID, Amount


Python counting occurrences: n = [1,3,2,7,5,2,3,3] get_threes = list(filter(lambda n: n == 3, n)) count_threes = len(threes)


Python sorting: my_array.sort() - sorts and mutates the array in place np.sort(my_array) - returns a copy of the sorted array without mutating the original array


Python most frequent: freq = [3, 5, 2, 2, 1, 2, 3, 2] print(max(set(freq), key = freq.count)) # 2 Finds the most frequently occurring value in a list or string.


SQL data loading in chunks: SELECT col_names FROM table ORDER BY col_1 OFFSET @PageSize * @PageNumber ROWS FETCH NEXT @PageSize ROWS ONLY; A paging query where @PageSize is the number of rows you want in each chunk and @PageNumber is the iteration number.


Python formatting: To simultaneously perform several independent replacements use - replace(). Eg. replace a comma and dollar sign with ' ' string, then convert column values to floats: df['col_name'] = df.col_name.replace('[$,]', ' ', regex = True).astype('float')


_allthingsdata_ reposted
DavidVineyard9's tweet image.

Python functions: factorize() df['newcol_name'] = df.col_name.factorize()[0] Encodes each value from an existing column as an integer.


Array similarity: allclose() Returns True if items in two arrays are similar within a specific tolerance.


Numpy function: intersect1d() Returns the values that intersect in two arrays i.e. the items that are common in both arrays.


Numpy functions: argmin() argmax() Return the indices of the minimum resp. maximum values.


Python: Get the index of the maximal (or minimal) element in a list or tuple: x = [3, 1, 12, 3, 8] max((item, i) for i, item in enumerate(x))[1] 2


SQL sampling -- selects 20 percent of the data: SELECT * FROM table WHERE random() > 0.8


Loading...

Something went wrong.


Something went wrong.