Architecture
- Pipeline
- Apache Beam
- GCP Dataflow
- messaging
- cloud Pub/Sub
- Apache Kafka
- logging
- fluentd
- monitoring
- Prometheus
- DataDog
- grafana
- zabbix
- workflow/job scheduler
- Apache Airflow
- Luigi
- Jekins
- kuroko2
- digdag
- BI
- tableau
- redash
- looker
- Streaming
- Apache Kafka
- Amazon Kinesis Streams
- Amazon Kinesis Firehose
- Apache Spark Streaming
- Batch transfering
- Embulk
- Data storage
- GCP BigQuery
- Amazon S3
- GCP Cloud Storage
- Apache Hadoop
- Database
- MySQL
- MongoDB
- PostgreSQL
- Search server
- hyperestraier
- Apache Solr
- Elastic Search
- CI/CD
- Circle CI
- Travis CI
- Spinnaker
- Communication
- slack
- hipchat
- Provisioning
- Terraform
- chef
- Ansible
- Management of Container
- Kubernetes
- unclassified so far
- Amazon Lambda
- GCP Cloud Function
- redis
- Amazon DynamoDB
- Amazon Athena
- Amazon Redshift
- Apache Hive
- OWASPZAP
-
- Data Acquisition
- streaming
- batch
- Data Acquisition
-
- store raw data
-
- data processing
- feature engineering
- data processing
-
- store prcessed data
-
- learning/training models
- feature slection
- learning/training models
-
- prediction
-
- A/B testing
Service
Preprocessing
- Renaming
- Filling missing values
Feature Engineering
- why this component exist?
- which component this component communicate to?
- expected inputs
- streaming
- batch
- outputs
- streaming
- batch
- storage for this components
- structured knowledge of data such as meaning of the URL
- for instance, third party URL has information about page information implicitly
http://example.com/items/1/1234
,/1/
is item category such as clothes or games
- structured knowledge of data such as meaning of the URL
-
what does this component communicate to?
- Rescaling
- Discretization
- Aggregation
Learning/Training
- reporting
- measurements of models such as precisions, acuracy
- feature selection
- sampling