GitScrum / Docs
Todas as Boas Práticas

Projetos de Data Engineering | GitScrum

Gerencie projetos de data engineering com GitScrum. Rastreie pipelines de dados, desenvolvimento ETL e qualidade de dados para times de analytics.

4 min de leitura

How to use GitScrum for data engineering projects?

Manage data engineering in GitScrum with labels for pipeline types, track ETL development through standard workflow, and document data models in NoteVault. Coordinate with analysts on requirements, track data quality issues, and manage infrastructure work. Organized data teams deliver 40% more reliable pipelines [Source: Data Engineering Research 2024].

Data engineering workflow:

  • Requirements - From analysts/stakeholders
  • Design - Data model, pipeline design
  • Develop - Build ETL/pipeline
  • Test - Data validation
  • Stage - Pre-production testing
  • Deploy - Production release
  • Monitor - Ongoing quality
  • Data engineering labels

    LabelPurpose
    type-etlETL pipelines
    type-streamingReal-time pipelines
    type-analyticsAnalytics models
    type-infrastructureData infrastructure
    data-qualityQuality issues
    source-[name]Data source
    destination-[name]Data destination

    Pipeline task template

    ## Pipeline: [name]
    
    ### Details
    - Source: [system/table]
    - Destination: [warehouse/table]
    - Schedule: [cron/trigger]
    - SLA: [freshness requirement]
    
    ### Checklist
    - [ ] Schema design
    - [ ] Transform logic
    - [ ] Data validation
    - [ ] Performance test
    - [ ] Deploy to staging
    - [ ] Stakeholder review
    - [ ] Deploy to production
    - [ ] Monitor setup
    

    NoteVault data documentation

    DocumentContent
    Data catalogTables, columns, types
    Data lineageSource to destination
    Pipeline inventoryAll pipelines
    Data dictionaryBusiness definitions
    SLAsFreshness requirements

    Columns for data projects

    ColumnPurpose
    BacklogAll work
    DesignSchema, logic design
    DevelopmentBuilding
    TestingData validation
    StagingPre-production
    ProductionDeployed

    Data quality tracking

    Issue TypeLabel
    Missing datadq-completeness
    Wrong datadq-accuracy
    Late datadq-timeliness
    Duplicate datadq-uniqueness
    Format issuesdq-consistency

    Stakeholder coordination

    StakeholderCoordination
    AnalystsRequirements, validation
    BusinessSLA definition
    EngineeringSource system access
    SecurityData access controls

    Pipeline dependencies

    DependencyHandling
    Source changesLinked tasks
    Upstream pipelinesRun order
    Schema changesMigration tasks
    InfrastructurePlatform tasks

    Data testing checklist

    TestVerify
    Row countsExpected volume
    Null checksRequired fields
    Range checksValid values
    ReferentialKey relationships
    FreshnessSLA compliance

    Common data engineering issues

    IssueSolution
    Undocumented pipelinesNoteVault requirement
    Quality issuesMonitoring tasks
    Long runsPerformance tasks
    Failed jobsIncident workflow

    Data team metrics

    MetricTrack
    Pipeline reliability% successful runs
    Data freshnessSLA compliance
    Quality issuesOpen DQ tasks
    Delivery timeTask cycle time

    Related articles