logo

Data merging and Cleaning Assignment PDF

10 Pages2051 Words29 Views
   

Added on  2021-05-31

Data merging and Cleaning Assignment PDF

   Added on 2021-05-31

ShareRelated Documents
Table of Contents
1. Introduction................................................................................................................................2
2. Data merging and cleaning.......................................................................................................2
3. RESTful web services................................................................................................................4
4. Mashups:....................................................................................................................................5
5. Demo Running Instruction:.....................................................................................................5
Combine the two files....................................................................................................................5
6. Conclusion:.................................................................................................................................9
References.......................................................................................................................................9
1
Data merging and Cleaning  Assignment PDF_1
1. Introduction
Data and system integration has combined data from different sources, the technologies
are stored in various sources and data has unique view. System integration as bring the
component of sub-systems from one aggregation of cooperating subsystems so the system have
able to deliver an overarching functioning and ensure a subsystems function. In data and system
integration the mashups and restful services concept has done in application approach that can be
allows multiple services in one users, for each services it’s have own goals, the purpose as serves
to create a service. Restful services is a style of architecture to design a loosely coupled web
services. It main purpose is to develop a fast and lightweight service. Rest is a distributed
hypermedia application.
2. Data merging and cleaning
In data merging and cleaning the PETL and ETL library package are used in python
programming. PETL package is mainly to extracting, transforming and loading a tables to the
data. The package has no dependencies during writing compare to other python core modules,
the installation and maintenance. It has more third party packages that may be useful for PETL
library. ETL has three process Extract, transform and load. ETL has collection of data and
various reports from one data store, analysis. ETL can also implemented by scripts and ETL tool.
In data cleaning ETL has implement by data generate to create more formats like JSON, XML or
CSV. Data cleaning is changing information in a database to check whether it has correct and
standard data. ETL process is a main part of the cleaning. The ETL system has combined with
many information sources and have representation has repeated data. This approach must have to
detect and recover all errors and both in single information resources have to remove the
unwanted information. Data merging in ETL process have wants to identify related sub processes
into ETL process. The same flow have common sub processes at any stage. The sub processes
have to rewrite an ETL processes. The common steps in ETL processes has less possible. Using
transitive closure all individual results are combing. The independent result generates and
produces more accurate result, in lower cost. The rule programming module is provided by the
system. It is easy to find and locate the duplicate environment. Large amount of data is used in
this application. The real world data base is done by data merging. The final result generates the
statistical data.
2
Data merging and Cleaning  Assignment PDF_2
PETL python package
Petl is a python package index. The following command is used to describe the pip
$ pip install petl
And to download manually, extract and run by following command
python setup.py install
To verify the installation following command is used
$ pip install nose
$ nosetests –v pet1
We are using the python version 2.7 and 3.4. The UNIX and WINDOWS operating
system is used to perform python.
ETL pipelines
Using this package we can easily avoid the lazy evolution and iterations. The pipeline
will not execute accurately, until the data is required.
For instance
>>> example_data = """foo,bar,baz
... a,1,3.4
... b,2,7.4
... c,6,2.2
... d,9,8.1
... """
>>> with open('example.csv', 'w') as f:
... f.write(example_data)
...
petl.util.vis.look() is a calling function. Using this function easily write the data and files
or database.
3
Data merging and Cleaning  Assignment PDF_3

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
data and system integration Assignment
|13
|2036
|49

ICT705 Data and System Integration
|17
|2694
|340

Data and System Integration for Desklib
|12
|1561
|446

Report on Medical Software Development using Python
|7
|1551
|185

Data and System Integration for Clinic Locator App
|14
|1682
|82

Data System and Integration: A Study on Data Cleaning, Merging, RESTful Web Services, and Mashups
|12
|1694
|296