Seattle Data Guy
Seattle Data Guy
  • Видео 254
  • Просмотров 4 941 391
Going From Data Engineer To Head Of Data - How To Run A Data Team Successfully
You join a 1000 person company as the head of data. What should you do?
I would invest a a lot of time up front to understand the business(especially if you haven't worked in the industry).
I was just talking to someone at the Snowflake Summit who told me they made the mistake of recently being put in charge of a data team and their first response was "Great, what tools can I use".
If you can't answer the following questions within the first month or two, you're probably going in the wrong direction.
- What are the main drivers of the business?
- What are the main pain points of the customers?
- What does the business flow look like?
- Who is our customer?
- (What other questions would you add)
Mo...
Просмотров: 3 110

Видео

Apache Spark Vs Apache Flink - Looking Through How Different Companies Approach Spark And Flink
Просмотров 3,5 тыс.14 дней назад
As data increased in volume, velocity, and variety, so, in turn, did the need for tools that could help process and manage those larger data sets coming at us at ever faster speeds. As a result, frameworks such as Apache Spark and Apache Flink became popular due to their abilities to handle big data processing in a fast, efficient, and scalable manner. But we often find that sometimes it can be...
Intro To Databricks SQL AI Functions - 5 SQL AI Functions Databricks Has And How To Use Them
Просмотров 2,4 тыс.28 дней назад
Databricks and Snowflake have been releasing various forms of AI SQL functionality. So I asked Josue Bogran if he'd walk through using some of Databricks SQL AI functions that they just put out! If you'd like to learn more about Databricks you should follow Josue here- www.linkedin.com/in/josuebogran/ If you enjoyed this video, check out some of my other top videos. Top Courses To Become A Data...
If I could give advice to myself when starting as a data engineer
Просмотров 4,8 тыс.Месяц назад
We all get stuck in our careers. Whether it's because of the team we're on, the point of life we're in, etc. So I wanted to talk about some tips I have for people looking to supercharge their data engineering career. If you enjoyed this video, check out some of my other top videos. Top Courses To Become A Data Engineer ruclips.net/video/kW8_l57w74g/видео.html What Is The Modern Data Stack - Int...
Data Modeling Where Theory Meets Reality - How Different Companies I Worked At Modeled Their Data
Просмотров 10 тыс.Месяц назад
Data modeling varies at different companies. At facebook we had plenty of storage and often treated historical data modeling very differently compared to when I worked at an enterprise. The concept of slowly changing dimensions wasn't as prevalent and instead we simply stored snapshots of data every day. So let's talk about modeling historical data and how it varied. If you enjoyed this video, ...
How To Escape The Rat Race - 6 Tips I Wish I Had Before I Became An Independent Consultant
Просмотров 3,3 тыс.2 месяца назад
It feels like everyone is trying to quit their 9-5. Ok, that could just be a some level of bias, but there are plenty of people looking to escape the rat race. The question becomes how and what are the pitfalls along the way. In this video I'll provide the advice I wish I had when I started my journey. I hope it helps! If you enjoyed this video, check out some of my other top videos. The Ultima...
What Is S3 And How Can You Query It With AWS Athena - AWS Data Engineering 101
Просмотров 3,1 тыс.2 месяца назад
S3 is a commonly used AWS solution for data lakes and staging areas. Data engineers need AWS and it also supports so many other solutions like Snowflake when hosted on AWS. So what is S3 and how can data engineers use it? How can data engineers use it to read from AWS Athena? Also, I reference a video that shows how to set up an S3 Snowpipe integration, here is the link from @mastering_snowflak...
What Tools Should Data Engineers Know In 2024 - 100 Days Of Data Engineering
Просмотров 30 тыс.2 месяца назад
What tools should a data engineer know? Honestly this video is more of a list of tools that goes far beyond what most data engineers know but I wanted to create a video that shared a list of data engineering tools for the 100 days of data engineering video. So here it is! Also, if you're looking to check out some tools that I am advising for you can look at Estuary for data ingestion bit.ly/3Ed...
Using AWS Lambda As A Data Engineering - Automating An API Extract With AWS Lambda And Eventbridge
Просмотров 4,4 тыс.3 месяца назад
I recently posted the first video in a series about AWS and data engineering. This is the second video where we will dive into how you can use AWS Lambda to perform automations to scrape data from an API. You can find the basic code here - gist.github.com/bAcheron/8945c0c0ecd59df5e02397a35ba445e6 Also if you'd like to see the prior video you can find it here: AWS Services YoU Need To Know As A ...
Best AWS Services You Need To Know As A Data Engineer - How To Become A Data Engineer
Просмотров 6 тыс.3 месяца назад
Best AWS Services You Need To Know As A Data Engineer - How To Become A Data Engineer
Optimizing Your Data Infrastructure - How To Become A Better Data Engineer
Просмотров 6 тыс.4 месяца назад
Optimizing Your Data Infrastructure - How To Become A Better Data Engineer
Data Modeling - Walking Through How To Data Model As A Data Engineer - Dimensional Modeling 101
Просмотров 24 тыс.4 месяца назад
Data Modeling - Walking Through How To Data Model As A Data Engineer - Dimensional Modeling 101
How And Why Data Engineers Need To Care About Data Quality Now - And How To Implement It
Просмотров 9 тыс.4 месяца назад
How And Why Data Engineers Need To Care About Data Quality Now - And How To Implement It
Fastest way to Start Your Data Engineer Journey in 2024 - 100 Days Of Data Engineering Crash Course
Просмотров 70 тыс.5 месяцев назад
Fastest way to Start Your Data Engineer Journey in 2024 - 100 Days Of Data Engineering Crash Course
The Ultimate Guide To Starting An Independent Consulting Company In 2024 | Data Consulting 101
Просмотров 12 тыс.5 месяцев назад
The Ultimate Guide To Starting An Independent Consulting Company In 2024 | Data Consulting 101
Data Modeling - Why Data Engineers Need To Understand It - An Introduction To Data Engineering
Просмотров 30 тыс.6 месяцев назад
Data Modeling - Why Data Engineers Need To Understand It - An Introduction To Data Engineering
What Is Apache Druid And Why Do Companies Like Netflix And Reddit Use It?
Просмотров 7 тыс.7 месяцев назад
What Is Apache Druid And Why Do Companies Like Netflix And Reddit Use It?
The Realities Of Airflow - The Mistakes New Data Engineers Make Using Apache Airflow
Просмотров 13 тыс.8 месяцев назад
The Realities Of Airflow - The Mistakes New Data Engineers Make Using Apache Airflow
Data Architects Vs Data Engineers - Is There A Difference?
Просмотров 11 тыс.9 месяцев назад
Data Architects Vs Data Engineers - Is There A Difference?
What Is Docker - Docker Intro And Tutorial On Setting Up Airflow | High Paying Data Engineer Skills
Просмотров 7 тыс.9 месяцев назад
What Is Docker - Docker Intro And Tutorial On Setting Up Airflow | High Paying Data Engineer Skills
How To Fast Track Your Data Engineering Career - Translating Business Requirements Into Value
Просмотров 7 тыс.10 месяцев назад
How To Fast Track Your Data Engineering Career - Translating Business Requirements Into Value
Everyone's Data Infrastructure Is A Mess - The Truth About Working As A Data Engineer
Просмотров 7 тыс.10 месяцев назад
Everyone's Data Infrastructure Is A Mess - The Truth About Working As A Data Engineer
Data Modeling Challenges - The Issues Data Engineers & Architects Face When Implementing Data Models
Просмотров 24 тыс.11 месяцев назад
Data Modeling Challenges - The Issues Data Engineers & Architects Face When Implementing Data Models
Why I Left Data Science - And Picked Data Engineering Instead
Просмотров 16 тыс.11 месяцев назад
Why I Left Data Science - And Picked Data Engineering Instead
What Is Change Data Capture - Understanding Data Engineering 101
Просмотров 9 тыс.Год назад
What Is Change Data Capture - Understanding Data Engineering 101
How I'd Become A Data Engineer (If I had to start over as a data analyst in 2023)
Просмотров 61 тыс.Год назад
How I'd Become A Data Engineer (If I had to start over as a data analyst in 2023)
A Decade In Data Engineering - Has Anything Actually Changed?
Просмотров 9 тыс.Год назад
A Decade In Data Engineering - Has Anything Actually Changed?
Data Engineering Vs Machine Learning Pipelines - What Is The Difference
Просмотров 8 тыс.Год назад
Data Engineering Vs Machine Learning Pipelines - What Is The Difference
Will Data Engineering Exist In 5 Years - Is Data Engineering A Good Career Choice?
Просмотров 53 тыс.Год назад
Will Data Engineering Exist In 5 Years - Is Data Engineering A Good Career Choice?
Can AI Code A Data Engineering Project - Using ChatGPT To Code A Python Project
Просмотров 6 тыс.Год назад
Can AI Code A Data Engineering Project - Using ChatGPT To Code A Python Project

Комментарии

  • @ollienicholson
    @ollienicholson Час назад

    Hey youtube.com/@SeattleDataGuy, love your videos so far! Was curious if you'd like to add your insight into the following terms? Batch processing v stream processing OLTP v OLAP

  • @RedCodra_
    @RedCodra_ 23 часа назад

    Love hearing independent consulting best practices and what to expect! How do you approach accessing these clients' data and systems as a third party? Do they typically just give you a license (in the case of, say, Microsoft 365) as they would a W2 employee?

    • @SeattleDataGuy
      @SeattleDataGuy 12 часов назад

      Sometimes, usually more a "seat" vs a license, I haven't often needed Microsoft products that are on a laptop, and if I have the client has sent me a physical laptop.

  • @FirstNameLastName-fv4eu
    @FirstNameLastName-fv4eu День назад

    This guy is the best example when you spend 10 yrs of your professional life in "super-cheap-money-world" what happens, a smart kid with a very vague idea of the real world :)

    • @SeattleDataGuy
      @SeattleDataGuy День назад

      You think I am smart shucks. What is the real world to you?

    • @FirstNameLastName-fv4eu
      @FirstNameLastName-fv4eu День назад

      @@SeattleDataGuy explaining the same reason to a Bank where people dont evaluate a technology on "how much money" it has raised. Your generation is just spoiled or scammed by cheap money culture.

    • @SeattleDataGuy
      @SeattleDataGuy 12 часов назад

      Who do you think is responsible for cheap money culture?

  • @DataPains
    @DataPains 2 дня назад

    Used it for years, I also tried the later 2.x version, I still don't like it, and I think there are better ways of architecting pipelines. But yeah I was amazed when I saw Airflow the first time, and it did solve a lot of problems, but I still think, it is a tool of the past. I hope I am wrong!

    • @SeattleDataGuy
      @SeattleDataGuy 12 часов назад

      It's been a decade, so I wouldn't be surprised to see it replaced in the next 5 years. But never know, some things are hard to get rid of.

  • @DanielKamau-ku5cs
    @DanielKamau-ku5cs 3 дня назад

    Not clearly explained, just bs .

  • @Kira-ji5pr
    @Kira-ji5pr 4 дня назад

    I’m thinking of switching from full stack to data engineering . Any advice ??

    • @SeattleDataGuy
      @SeattleDataGuy 12 часов назад

      Is there a reason you want to switch?

  • @William-B
    @William-B 4 дня назад

    We’re a young data team for a large organization. Biggest roadblocks for us are issues with data governance (“you can’t have or report on our data”), budget for tooling (“prove the value of the tool, then we can purchase it”), and cloud concerns (“all my data is on-prem. You can’t just put it in the cloud”)

    • @SeattleDataGuy
      @SeattleDataGuy 11 часов назад

      Yeah, those are always a struggle. In some companies you'll never win that batter(until leadership gets changed out) in other cases you have to be willing to speak your mind and say, "Hey I can't do XYZ which you asked me to do under these conditions, so either things stay the same or you start opening doors". But thats easier to say as a consultant because I don't mind ending a project if a client won't work with me to get to the to the goal they wanted to get to(never had to go that far).

  • @smrtysam
    @smrtysam 5 дней назад

    This has happened to me. Now I’m leading a team of data scientists, engineers, analysts and migration specialists. I’ve had to learn so much so quick about strategy and people management. I’ve had to coach the people on my team to really empower and own their own tasks. At the beginning of being head of data I was taking on way too many “low level tasks”. Now I’m delegating and empowering. I still have alot to learn though.

    • @SeattleDataGuy
      @SeattleDataGuy 11 часов назад

      This is an awesome story of growth. Any tips for future heads of data?

  • @crisithink9509
    @crisithink9509 5 дней назад

    I wonder how much Data God has in the Aether/Astral realm 🤔

  • @SeattleDataGuy
    @SeattleDataGuy 5 дней назад

    If you're looking for help setting up your data team and strategy, then feel free to set-up a free consultation here - calendly.com/ben-rogojan/consultation

  • @Ian-vh2vv
    @Ian-vh2vv 5 дней назад

    Just went thru this process with my company the past year. Great video. With us it went something like: - Where is all of our data - How are we doing reporting now - What are the shortcomings of existing reporting solutions - Do we need a warehouse (yes) - What warehouse do we pick - What ETL stack makes sense for our use case - What do we integrate in what order to maximize value and get adoption rolling Also, Having someone on the exec level champion the BI effort and really push it forward was huge for the thing to actually materialize.

    • @SeattleDataGuy
      @SeattleDataGuy 5 дней назад

      Thanks for sharing! I really appreciate it when people add more context and their own experiences. Were there any gotchas you ran into while going through this process?

    • @baw5xc333
      @baw5xc333 4 дня назад

      How long did this rollout take?

    • @Ian-vh2vv
      @Ian-vh2vv 4 дня назад

      @@baw5xc333 about 6 months from step 1 until I started development (first snowflake table and started integrating our first source system)

  • @sirus312
    @sirus312 5 дней назад

    I keep hearing from top CEOs that with Palantir we don't need teams anymore

    • @SeattleDataGuy
      @SeattleDataGuy 5 дней назад

      I'd love to believe this! I guess the reason I have a hard time believing it is because I know there are lots of consultants that work in the space of setting up Palantir which suggests that it still requires technical skills to set-up and work with(also based on a few conversations I have had with people working with Palantir). But always happy to be wrong.

  • @hakeem1340
    @hakeem1340 5 дней назад

    Thank you for sharing

  • @hantt
    @hantt 5 дней назад

    the de role should not exist, it should just be sde who also own data as a product. kind of lile front end, backend, thete will be a data focused engineer, that we can call data engineer. o wait

  • @nathannguyen2041
    @nathannguyen2041 6 дней назад

    Hm. Makes me think that I should DM the data engineer that I vaguely know and have communicated with once or twice on Slack about what kind of work he does and if I would be able to work on low priority projects. Any recommended ice breakers?

  • @crypt_hodl
    @crypt_hodl 7 дней назад

    Interested! can you please have special pricing for people in Africa. 50% reduction is good but our earnings are way too low probably 20x less than those in US or Europe. It becomes difficult for us to participate in this type of good courses. Any help! Thanks.

    • @SeattleDataGuy
      @SeattleDataGuy 11 часов назад

      Sure, happy to give a coupon, here is one for 80% off, once there are none left there are none left - lifetime_80

  • @madihenry7861
    @madihenry7861 7 дней назад

    Hi! can you please share the full screen for what you have typed under the config_file?

  • @data-dynamo-guy
    @data-dynamo-guy 7 дней назад

    I also find myself building stuff rather than analyzing business problems @@

    • @SeattleDataGuy
      @SeattleDataGuy 7 дней назад

      It's always interesting how we all come to the same conclusion, thanks for watching!

  • @Aristocle
    @Aristocle 8 дней назад

    Is there a service or scripting language that allows me to write relationships between tables/databases in a modern material design style?

  • @serk-s
    @serk-s 9 дней назад

    Man, you really need to stop pitching your voice higher at the end of your sentences :(

    • @SeattleDataGuy
      @SeattleDataGuy 7 дней назад

      fair enough, on the flip side i have picked up a vocal fry trying to do that lol

  • @richardmartin6605
    @richardmartin6605 9 дней назад

    Would love to see article reviews!

  • @initialb811
    @initialb811 10 дней назад

    This is really awesome. Would love to see more of this!

  • @TJInTech10
    @TJInTech10 10 дней назад

    thx for breaking it down

    • @SeattleDataGuy
      @SeattleDataGuy 7 дней назад

      glad you found it helper!

    • @TJInTech10
      @TJInTech10 7 дней назад

      @@SeattleDataGuy yes, thx , I'm trying to understand how Knowledge graph/Vector DB's will integrate into this too, is it safe to assume both will be essential pieces of the enterprise ai layer/stack now being invested in heavily, or do you see one being more relevant in next 2-5 yrs?

  • @SentinelaKosmos
    @SentinelaKosmos 14 дней назад

    Don’t just be a task taker, be a strategic player.

    • @SeattleDataGuy
      @SeattleDataGuy 7 дней назад

      thanks for reading my articles and watching my videos!

  • @B-gaming930-fl5qr
    @B-gaming930-fl5qr 15 дней назад

    E5 is where it's at 750 Million 😂

  • @osoucy
    @osoucy 15 дней назад

    To me, one of the main benefit of Spark Structured Streaming is that you can easily switch between near real-time (micro batches) and scheduled batch processing without having to re-writing a single line of code. This is a very effective way of scaling up and down and balancing costs vs latency.

    • @SeattleDataGuy
      @SeattleDataGuy 11 часов назад

      that is very useful! when do you think micro-batches make the most sense

  • @cestlachance7575
    @cestlachance7575 16 дней назад

    Is this really a good video? i feel like he just namedrops every techs

  • @moussaelaqqaoui
    @moussaelaqqaoui 17 дней назад

    Hello ben, can we have a discussion please !

  • @DataPains
    @DataPains 17 дней назад

    Great video! Thank you for sharing!

  • @danhorus
    @danhorus 19 дней назад

    13:03 in Spark, we avoid Python UDFs like the plague because they're much slower than native Spark code. I wonder if the same is true for Flink, given that it also runs on JVMs. A quick Google search indicates that vectorized UDFs are a thing in Flink too, so I assume the same limitations apply

    • @SeattleDataGuy
      @SeattleDataGuy 19 дней назад

      Thanks for the added context! It's much appreciated I now am thinking if I have ever had a good experience with a UDF 🤣. I always remember touting them, but even in one case where i do recall trying it out on SQL Server, we found it slow.

    • @danhorus
      @danhorus 19 дней назад

      ​​@@SeattleDataGuy With Spark, there are several ways to write transformations. By far, the best option is to use native Spark functions, as they compile to highly optimized and parallelized Java byte code. The second best option is to write UDFs in Scala or Java, as everything still runs in the same JVM. The third best option, in case you want/need to use Python, is to write a vectorized UDF (also known as Pandas UDF), which leverages Apache Arrow to move data between the JVM and the Python interpreter in batches. Finally, as a last resort, you can use regular Python UDFs, however they're a lot slower because they basically compute results row by row rather than in big batches. If you have slow Spark jobs using Python UDFs, refactoring them is usually a good way to gain some performance. About this blog post, I'm not sure the author is aware of this limitation, but if they need this code to run very very fast, they should probably avoid Python UDFs too

    • @danhorus
      @danhorus 19 дней назад

      ​@@SeattleDataGuyI wrote a long comment about the different types of UDFs in Spark, but apparently RUclips decided to delete it. Maybe you'll find it marked as spam, lol

    • @SeattleDataGuy
      @SeattleDataGuy 19 дней назад

      @@danhorus Did you put a url in it? That seems to be the main reason I have seen youtube define things as spam. I'll look

    • @danhorus
      @danhorus 18 дней назад

      Not really, but let's try again, haha. In Spark, there are many ways to apply data transformations. By far the best option is to use native Spark functions, as they compile to highly optimized/parallelized Java byte code. The second best option to maximize performance is to use Scala or Java UDFs, as they run inside the JVM with a minor performance hit. The third option, if you want/need to use Python, is to write a vectorized UDF (also known as Pandas UDF), which leverages Apache Arrow to transfer big batches of records to the Python interpreter and back to the JVM after processing. Finally, the last option you should consider is the regular Python UDF, as it basically transforms row by row and has much worse performance as a result. If you have a slow Spark job, refactoring Python UDFs can make it a lot faster. I'm not sure the authors of the blog post are aware of this, but they can probably make their code faster too

  • @jace743
    @jace743 19 дней назад

    I’d watch if you did live article reviews!

    • @SeattleDataGuy
      @SeattleDataGuy 19 дней назад

      Yeah! I think watching other creators do it, I really gotta slow down to do it well

  • @ankittjindal
    @ankittjindal 19 дней назад

    Recommend me some books as I only have an idea of python and sql so..which book best for me as a beginner in data engineering field

  • @damien__j
    @damien__j 19 дней назад

    Great video thanks!

  • @knkootbaoat6759
    @knkootbaoat6759 19 дней назад

    gotta make things complex otherwise we wouldnt get paid as much. i half joke. we dont make it complex it's just situations are inherently complex

    • @SeattleDataGuy
      @SeattleDataGuy 19 дней назад

      we do tend to do that some times....

  • @AyushMandloi
    @AyushMandloi 19 дней назад

    Sound of transition is very loud

    • @SeattleDataGuy
      @SeattleDataGuy 11 часов назад

      I am reducing these moving forward

  • @prico3358
    @prico3358 20 дней назад

    Better crossover than a batman & Iron man movie.

  • @tommynelson4795
    @tommynelson4795 22 дня назад

    Minor tip. I’d recommend removing the very high pitch transitions from your videos. I thought my tinnitus was acting up haha. Other than that great vid!

  • @user-ux4iu7us7p
    @user-ux4iu7us7p 23 дня назад

    What are your thoughts on the new AWS Data Engineering Certification?

  • @elcoxeroni8273
    @elcoxeroni8273 24 дня назад

    Thank you for this really great content! Which is the book you are referring to in your video? I like the structure much and am considering buying it. Thanks in advance!

  • @mrgenetics4063
    @mrgenetics4063 25 дней назад

    I want to become a data scientist or engineer….my biology degree has never brought me financial security and I hope to be rich one day

  • @otavioattuy5394
    @otavioattuy5394 26 дней назад

    Where do I find the theory behind the "types" of dimension tables?

  • @glstnlev
    @glstnlev 28 дней назад

    Interesting use case about SCD2 but how in practice do we create these tables? I understand the importance and how useful is it to have a new row for each change but can’t get how to model it to make it work

  • @abrahamgomez653
    @abrahamgomez653 28 дней назад

    I love learning about data engineering and overall cloud computing. Cloud is the future.

  • @DerekGatlin
    @DerekGatlin 28 дней назад

    Thank you guys so much for your transparency- it is refreshing and I am more interested in working with you in the future as a result.

  • @septic7
    @septic7 28 дней назад

    Are these salaries adjusted for 2024 tranches ? 😅🥲

    • @SeattleDataGuy
      @SeattleDataGuy 11 часов назад

      haha, I did just have a friend tell me they got a 400k offer from FB at an IC5 level if that helps provide a data point

  • @maxonthetrack
    @maxonthetrack 29 дней назад

    awesome! I enjoy learning about these AI concepts in this hands-on practical way

  • @NoahPitts713
    @NoahPitts713 29 дней назад

    Josue is the man! thank you both for the great conversation

  • @poorbadger
    @poorbadger 29 дней назад

    Re: SQL Serverless…. Databricks now has job/workflow serverless which works with notebooks - a few limitations but most functionality is supported. I still use SQL all the time but that’s made the cluster start up penalty w notebooks way better

  • @saadoa4969
    @saadoa4969 29 дней назад

    dissapointing to know that you don't answer your viewers' emails. Solid content though

    • @SeattleDataGuy
      @SeattleDataGuy 29 дней назад

      I do my best! I am always playing catch up, but thank you for the support!

  • @SreejaThumma
    @SreejaThumma 29 дней назад

    Can you also make a video on the difference between DataBricks, Snowflake and Solix technologies