The customer service unit in the company faced significant challenges with its existing reporting platform, leading to a lack of trust in the reported numbers among stakeholders. As it turned out the stakeholders had different perspectives and understanding of processes and metrics. Dispersed and outdated technical and business documentation contributed to the inconsistency. To address these issues and become a truly data-driven organisation, the company needed a solid foundation for reliable analytics and educated decision making.
Narwhal Data Solutions formed a small team of experienced data architects and engineers. Working closely with the client's management and technical staff, the team took a comprehensive approach to tackle the problems. The first step involved gathering and consolidating all documentation into a unified repository. Business and technical definitions, assumptions, and processes were meticulously documented, and a unified KPI library was created to establish precise and unambiguous metric definitions. Through collaboration with critical stakeholders, a shared understanding and consensus were achieved.
A modern cloud technology stack was employed, including dbt (Data Build Tool), Snowflake, and Power BI. Using the Kimball star schema, we built a data warehouse, which in turn served as a single data source for the reporting. The implementation process revealed data inconsistencies and quality issues in the source systems, which were addressed in close collaboration with the client's teams. Clear data contracts were established and enforced through automated tests to ensure data integrity and early detection of issues.
The successful implementation of the solution delivered significant improvements to the organisations' business intelligence capabilities. Data quality and reporting integrity were greatly enhanced, providing stakeholders with accurate and reliable information. The new data warehouse served as a solid foundation for reporting and analytics, while comprehensive documentation, tests, and CI/CD pipelines facilitated maintenance and future enhancements.
With a competent business intelligence team now in charge of maintaining and developing the platform, Narwhal Data Solutions continues to provide support and guidance as needed.
A rapidly expanding scale-up faced the challenge of obtaining relevant and accurate data to support decision-making processes across various departments. The existing legacy business intelligence (BI) system relied on individual reports accessing a transactional database, leading to redundant and complex logic and performance issues as data volume grew. To address these challenges, the company partnered with Narwhal Data Solutions to establish a dedicated data team and develop a robust data platform.
The solution involved implementing a cloud data warehouse infrastructure on AWS, utilising PostgreSQL as the open-source technology of choice. Through workshops, the team gathered information on key business processes, data sources, and reporting requirements. This knowledge was used for the architecture design of the new data platform. PowerBI was selected as the reporting tool, including its recently released Datamarts feature.
Following the Kimball methodology, the data team modeled complex logistic processes using a star schema composed of multiple facts and dimensions. Native PostgreSQL methods were used for data loading, and dbt (Data Build Tool) facilitated data transformations, documentation, and data tests. Performance optimization was a crucial aspect given the large volume of data (~1 TB), high expectations for data freshness and the scalability requirements.
The implementation of the data platform brought significant improvements to the client's business. The centralized cloud data warehouse provided timely delivery of relevant, accurate, and consistent information. PowerBI empowered users at all levels to make informed decisions confidently. Data pipelines were designed to refresh the data every 4 hours, with the entire data load and transformation process taking slightly over 1 hour for the 1 TB data warehouse. Users had access to the most up-to-date information from the current day.
The implemented data platform sets the foundation for future enhancements and scalability as the company continues to evolve. It accommodates increasing data demands and enables streamlined reporting processes.
A sport tech startup operated a data platform with Amazon Redshift as a central piece, supported by Amazon Glue and PySpark. The ETLs were orchestrated by Airflow.
The DWH served both the internal business intelligence analytics and provided the analytical data used by one of the digital products of the company. As the data volumes grew the performance issues became evident. The ambitious but small data team lacked both man-power and deep technical expertise in some areas to tackle the problems on the long run.
They made a wise decision to get an external consultant from Narwhal Data Solutions to work on the improvements to the existing system.
We identified some flaws in the existing data infrastructure, which led to both poor performance and unnecessary operating costs. After a careful analysis of the data and the typical queries we implemented new data partitioning scheme, which resulted in more balanced data distribution, reduced skewness and better performance of the cluster. Improved ETL processes furhter reduced the cluster load. The average query execution time was improved by a factor of 2-3x, while the cases of some queries running unacceptably long were practically completely eliminated.
Next, we proposed changes in the data architecture, which could further improve the performance of the system even with the growing data volume. At the same time the proposed changes could lead to a significant cost reductions thanks to reducing the Redshift cluster size and moving large part of the workload to PostgreSQL - a database more suitable for the observed usage pattern. An MVP was created to showcase the operating principles and demonstrate the performance and scalability potential of the solution. Then it was handed over to the internal team for the further development.
A startup in renewable energy sector collected and analysed the data originated from SCADA systems. The data infrastructure and the analytics platform was operated on Azure, with a DWH build according to Data Vault methodology running on SQL Server as a centrepiece. Due to the requirements to present part of the analysed data in near real time the data were loaded and processed continuously (stream / mini-batches).
Despite a relatively low volume of data (less than 1TB) the performance issues started to be visible. The company expected a rapid growth and wanted to prepare its data platform for scaling up by a factor of 100x in the near future. They asked Narwhal Data Solutions to review the current system, the existing requirements and performance issues, and ultimately propose a new architecture of the scalable system.
Several options for the architecture were proposed, starting from an improved version of the existing system, but migrated to SQL Hyperscale, to a stream processing system based on Apache Kafka and kSQL DB or Flink. Several other technology options has been carefully evaluated. Both functional and non-functional requirements were taken into account, pros and cons of the proposed architectures were highlighted, the TCO estimated and implementation / migration roadmap drafted.
Objective
We aimed to streamline the integration of macroeconomic data for a client in the finance and analytics sector.
Challenges
Our client needed efficient consolidation of macroeconomic statistics from various sources and automation of data collection and storage.
Solution
We leveraged Apache Camel K, a versatile integration platform, along with Kubernetes for orchestration.
Result
The project revolutionized data integration, making it efficient, reliable, and adaptable to changes in data sources and formats.
Nutze unseren KI-Bot, um gezielt Fragen zu diesem Dienstleister zu stellen, Inspiration für dein Projekt zu sammeln oder passende Alternativen zu finden. Schnell, einfach und rund um die Uhr für dich da!
Are you impressed by the project? Would you like to realise something similar? Share your vision with us now.
Do you have questions, ideas or need support? The service provider is just a click away and ready to help and advise you.
Comment
We hired Narwhal Data Solutions to help us set up a data warehouse using AWS RDS, dbt, for consumption by Microsoft PowerBI.
Marek Strzelczyk and his team have been nothing but professional, supportive, flexible, polite and friendly. With their help we've been able to set up a comprehensive reporting and analytics platform for our logistics and ecommerce data. Their experience and advice on the setup of the ETL pipeline and the general data architecture have been invaluable.