#databricksdaily search results
How do you handle data skew with repartition()? If a single key is causing skew, I add a random salt (like mod(rand()*N)) to spread that key into multiple partitions. This balances workload, reduces long-tail tasks, and speeds up shuffles. #DatabricksDaily #Databricks…
3/3 Too few partitions = slow, chunky tasks. Too many = pointless overhead. Balanced ones = beautiful pipeline runs. #DatabricksDaily #Databricks #DatabricksInterviewPrep #DatabricksPerformance
When is repartition(1) acceptable? Exporting small CSV/JSON to downstream systems Test data generation Creating a single audit/control file #Databricks #DatabricksDaily #DatabricksBasics
What happens when you call repartition(1) before writing a table? Is it recommended? Calling repartition(1) forces Spark to shuffle all data across the cluster and combine it into a single partition. This means the final output will be written as a single file. It is like…
2/3 If the job has heavy joins/shuffles, I bump partitions up. If the dataset is tiny, I scale them down (no point having 800 partitions for 2GB). And honestly, AQE is a lifesaver it fixes the small/oversized partitions at runtime. #DatabricksDaily #Databricks…
How do you handle data skew with repartition()? If a single key is causing skew, I add a random salt (like mod(rand()*N)) to spread that key into multiple partitions. This balances workload, reduces long-tail tasks, and speeds up shuffles. #DatabricksDaily #Databricks…
When is repartition(1) acceptable? Exporting small CSV/JSON to downstream systems Test data generation Creating a single audit/control file #Databricks #DatabricksDaily #DatabricksBasics
What happens when you call repartition(1) before writing a table? Is it recommended? Calling repartition(1) forces Spark to shuffle all data across the cluster and combine it into a single partition. This means the final output will be written as a single file. It is like…
3/3 Too few partitions = slow, chunky tasks. Too many = pointless overhead. Balanced ones = beautiful pipeline runs. #DatabricksDaily #Databricks #DatabricksInterviewPrep #DatabricksPerformance
2/3 If the job has heavy joins/shuffles, I bump partitions up. If the dataset is tiny, I scale them down (no point having 800 partitions for 2GB). And honestly, AQE is a lifesaver it fixes the small/oversized partitions at runtime. #DatabricksDaily #Databricks…
Something went wrong.
Something went wrong.
United States Trends
- 1. #GMMTV2026 1.74M posts
- 2. MILKLOVE BORN TO SHINE 290K posts
- 3. Good Tuesday 24.9K posts
- 4. WILLIAMEST MAGIC VIBES 45.3K posts
- 5. #tuesdayvibe 1,812 posts
- 6. Barcelona 151K posts
- 7. TOP CALL 9,390 posts
- 8. AI Alert 8,174 posts
- 9. Alan Dershowitz 2,960 posts
- 10. Barca 79.9K posts
- 11. Moe Odum N/A
- 12. Check Analyze 2,429 posts
- 13. Token Signal 8,589 posts
- 14. Unforgiven 1,141 posts
- 15. Brock 42.8K posts
- 16. Purdy 28.7K posts
- 17. Market Focus 4,645 posts
- 18. Enemy of the State 2,527 posts
- 19. Bryce 21.5K posts
- 20. Dialyn 8,209 posts