tute 8
1. Discuss the role of data in information systems indicating the
need for data persistence
What Is an Information System?
At the most basic level, an information system (IS) is
a set of components that work together to manage data processing and storage.
Its role is to support the key aspects of running an organization, such as
communication, record-keeping, decision making, data analysis and more.
Companies use this information to improve their business operations, make
strategic decisions and gain a competitive edge.
Information systems typically include a combination of
software, hardware and telecommunication networks. For example, an organization
may use customer relationship management systems to gain a better understanding
of its target audience, acquire new customers and retain existing clients. This
technology allows companies to gather and analyze sales activity data, define
the exact target group of a marketing campaign and measure customer
satisfaction.
00:00
The Benefits of Information Systems
Modern technology can significantly boost your
company's performance and productivity. Information systems are no exception.
Organizations worldwide rely on them to research and develop new ways to
generate revenue, engage customers and streamline time-consuming tasks.
With an information system, businesses can save time
and money while making smarter decisions. A company's internal departments,
such as marketing and sales, can communicate better and share information more
easily.
Since this technology is automated and uses complex
algorithms, it reduces human error. Furthermore, employees can focus on the
core aspects of a business rather than spending hours collecting data, filling
out paperwork and doing manual analysis.
Thanks to modern information systems, team members can
access massive amounts of data from one platform. For example, they can gather
and process information from different sources, such as vendors, customers,
warehouses and sales agents, with a few mouse clicks.
Uses and Applications
There are different types of information systems and
each has a different role. Business intelligence (BI) systems, for instance,
can turn data into valuable insights.
This kind of technology allows for faster, more
accurate reporting, better business decisions and more efficient resource
allocation. Another major benefit is data visualization, which enables analysts
to interpret large amounts of information, predict future events and find
patterns in historical data.
Organizations can also use enterprise resource
planning (ERP) software to collect, manage and analyze data across different
areas, from manufacturing to finance and accounting. This type of information
system consists of multiple applications that provide a 360-degree view of
business operations. NetSuite ERP, PeopleSoft, Odoo and Intacct are just a few
examples of ERP software.
Like other information systems, ERP provides
actionable insights and helps you decide on the next steps. It also makes it
easier to achieve regulatory compliance, increase data security and share
information between departments. Additionally, it helps to ensure that all of
your financial records are accurate and up-to-date.
In the long run, ERP software can reduce operational
costs, improve collaboration and boost your revenue. Nearly half of the
companies that implement this system report major benefits within six months.
At the end of the day, information systems can give
you a competitive advantage and provide the data you need to make faster,
smarter business decisions. Depending on your needs, you can opt for
transaction processing systems, knowledge management systems, decision support
systems and more. When choosing one, consider your budget, industry and
business size. Look for an information system that aligns with your goals and
can streamline your day-to-day operations
2. Explain the terms: Data,
Database, Database Server, and Database Management System
Data
Data (/ˈdeɪtə/ DAY-tə, /ˈdætə/ DAT-ə, /ˈdɑːtə/ DAH-tə)[1] is
a set of values of subjects with respect to qualitative or quantitative variables.
Data and information or knowledge are
often used interchangeably; however data becomes information when
it is viewed in context or in post-analysis [2].
While the concept of data is commonly associated with scientific research, data is collected by a
huge range of organizations and institutions, including businesses (e.g., sales
data, revenue, profits, stock price), governments (e.g., crime rates, unemployment
rates, literacyrates) and non-governmental organizations (e.g.,
censuses of the number of homeless
people by non-profit organizations).
Data is measured, collected and
reported, and analyzed,
whereupon it can be visualized using graphs, images or other
analysis tools. Data as a general concept refers
to the fact that some existing information or knowledge is represented or coded in
some form suitable for better usage or processing. Raw data ("unprocessed
data") is a collection of numbers or characters before it has been
"cleaned" and corrected by researchers. Raw data needs to be
corrected to remove outliers or obvious instrument or data entry errors
(e.g., a thermometer reading from an outdoor Arctic location recording a
tropical temperature). Data processing commonly occurs by stages, and the
"processed data" from one stage may be considered the "raw
data" of the next stage. Field data is
raw data that is collected in an uncontrolled "in situ"
environment. Experimental data is data that is
generated within the context of a scientific investigation by observation and
recording. Data has been described as the new oil of
the digital economy.[3][4]
Database
A database is
an organized collection of data,
generally stored and accessed electronically from a computer system. Where
databases are more complex they are often developed using formal design and
modeling techniques.
The database
management system (DBMS) is the software that
interacts with end users, applications, and the database itself to capture
and analyze the data. The DBMS software additionally encompasses the core
facilities provided to administer the database. The sum total of the database,
the DBMS and the associated applications can be referred to as a "database
system". Often the term "database" is also used to loosely refer
to any of the DBMS, the database system or an application associated with the
database.
Computer scientists
may classify database-management systems according to the database
models that they support. Relational databasesbecame dominant in the
1980s. These model data as rows and columns in
a series of tables, and the vast majority use SQL for writing and
querying data. In the 2000s, non-relational databases became popular, referred
to as NoSQL because
they use different query languages.
Formally, a
"database" refers to a set of related data and the way it is
organized. Access to this data is usually provided by a "database
management system" (DBMS) consisting of an integrated set of computer
software that allows users to
interact with one or more databases and provides access to all of the data
contained in the database (although restrictions may exist that limit access to
particular data). The DBMS provides various functions that allow entry, storage
and retrieval of large quantities of information and provides ways to manage
how that information is organized.
Because of the close
relationship between them, the term "database" is often used casually
to refer to both a database and the DBMS used to manipulate it.
Outside the world of
professional information technology, the term database is
often used to refer to any collection of related data (such as a spreadsheet or
a card index) as size and usage requirements typically necessitate use of a
database management system.
Existing DBMSs provide
various functions that allow management of a database and its data which can be
classified into four main functional groups:
· Data definition – Creation,
modification and removal of definitions that define the organization of the
data.
· Update – Insertion,
modification, and deletion of the actual data.
· Retrieval – Providing
information in a form directly usable or for further processing by other
applications. The retrieved data may be made available in a form basically the
same as it is stored in the database or in a new form obtained by altering or
combining existing data from the database.
· Administration – Registering
and monitoring users, enforcing data security, monitoring performance,
maintaining data integrity, dealing with concurrency control, and recovering
information that has been corrupted by some event such as an unexpected system
failure.[4]
Both a database and
its DBMS conform to the principles of a particular database
model."Database system" refers collectively to the
database model, database management system, and database.
Physically,
database servers are dedicated computers that hold
the actual databases and run only the DBMS and related software. Database
servers are usually multiprocessorcomputers, with generous memory
and RAID disk arrays used
for stable storage. RAID is used for recovery of data if any of the disks fail.
Hardware database accelerators, connected to one or more servers via a
high-speed channel, are also used in large volume transaction processing
environments. DBMSs are found at the heart of most database applications. DBMSs may be built
around a custom multitasking kernel with built-in networking support,
but modern DBMSs typically rely on a standard operating
system to provide these functions.
Since DBMSs comprise a
significant market, computer and storage vendors often take
into account DBMS requirements in their own development plans.[7]
Databases and DBMSs
can be categorized according to the database model(s) that they support (such
as relational or XML), the type(s) of computer they run on (from a server
cluster to a mobile phone), the query
language(s) used to access the database (such as SQL or XQuery),
and their internal engineering, which affects performance, scalability,
resilience, and security.
Database
Server
A database
server is a server which houses a database application that
provides database services
to other computer programs or to computers,
as defined by the client–servermodel.[citation needed][1][2] Database management systems (DBMSs)
frequently provide database-server functionality, and some database management
systems (such as MySQL) rely exclusively on the client–server model for
database access (while others e.g. SQLite are
meant for using as an embedded
database).
Users access a
database server either through a "front end" running on the user's
computer – which displays requested data – or through the "back end", which runs on the server and
handles tasks such as data analysis and storage.
In a master-slave model, database master
servers are central and primary locations of data while database slave servers
are synchronized backups of the master acting as proxies.
Most database
applications respond to a query
language. Each database understands its query language and converts
each submitted query to server-readable form and
executes it to retrieve results.
Examples of
proprietary database applications include Oracle, DB2, Informix,
and Microsoft SQL Server. Examples of free software database
applications include PostgreSQL; and under the GNU General Public Licence include Ingres and MySQL. Every server uses
its own query logic and structure. The SQL (Structured Query
Language) query language is more or less the same on all relational database applications.
For clarification, a
database server is simply a server that maintains services related to clients
via database applications.
Database
Management System
A database management
system (DBMS) is system software for creating and managing databases.
The DBMS provides users and programmers with a systematic way to create,
retrieve, update and manage data.
A DBMS makes it
possible for end users to create, read, update and delete data in
a database. The DBMS essentially serves as an interface between the database and
end users or application
programs, ensuring that data is consistently organized and remains
easily accessible.
The DBMS manages three
important things: the data, the database engine that allows
data to be accessed, locked and modified -- and the database schema,
which defines the database’s logical structure. These three foundational
elements help provide concurrency,
security, data
integrity and uniform administration procedures. Typical
database administration tasks supported by the DBMS include change
management, performance monitoring/tuning and backup and recovery.
Many database management systems are also responsible for automated rollbacks,
restarts and recovery as well as the logging and auditing of
activity.
The DBMS is perhaps
most useful for providing a centralized view of data that can be accessed by
multiple users, from multiple locations, in a controlled manner. A DBMS can
limit what data the end user sees, as well as how that end user can view the
data, providing many views of a single database schema. End users and software
programs are free from having to understand where the data is physically
located or on what type of storage media it resides because the DBMS handles
all requests.
The DBMS can offer
both logical and physical data independence. That means it can protect users
and applications from needing to know where data is stored or having to be
concerned about changes to the physical structure of data (storage and
hardware). As long as programs use the application programming interface (API)
for the database that is provided by the DBMS, developers won't have to modify
programs just because changes have been made to the database.
With relational DBMSs
(RDBMSs),
this API is SQL,
a standard programming language for defining, protecting and accessing data in
a RDBMS.
Popular types of DBMSes
Popular database
models and their management systems include:
Relational database
management system (RDMS) - adaptable to most use cases, but RDBMS Tier-1 products
can be quite expensive.
NoSQL DBMS -
well-suited for loosely defined data structures that may evolve over
time.
In-memory database
management system (IMDBMS) - provides faster response times and better
performance.
Columnar database
management system (CDBMS) - well-suited for data
warehouses that have a large number of similar data items.
Cloud-based data
management system - the cloud service provider
is responsible for providing and maintaining the DBMS.
Advantages of a DBMS
Using a DBMS to store
and manage data comes with advantages, but also overhead. One of the biggest
advantages of using a DBMS is that it lets end users and application
programmers access and use the same data while managing data integrity. Data is
better protected and maintained when it can be shared using a DBMS instead of
creating new iterations of the same data stored in new files for every new
application. The DBMS provides a central store of data that can be accessed by
multiple users in a controlled manner.
Central storage and
management of data within the DBMS provides:
· Data abstraction and
independence
· Data security
· A locking mechanism
for concurrent access
· An efficient handler
to balance the needs of multiple applications using the same data
· The ability to swiftly
recover from crashes and errors, including restartability and recoverability
· Robust data integrity
capabilities
· Logging and auditing
of activity
· Simple access using a
standard application programming interface (API)
· Uniform administration
procedures for data
Another advantage of a
DBMS is that it can be used to impose a logical, structured organization on the
data. A DBMS delivers economy of scale for processing large amounts of data
because it is optimized for such operations.
A DBMS can also
provide many views of a single database schema. A view defines what data the
user sees and how that user sees the data. The DBMS provides a level of
abstraction between the conceptual schema that defines the logical structure of
the database and the physical schema that describes the files, indexes and
other physical mechanisms used by the database. When a DBMS is used, systems
can be modified much more easily when business requirements change. New
categories of data can be added to the database without disrupting the existing
system and applications can be insulated from how data is structured and
stored.
Of course, a DBMS must
perform additional work to provide these advantages, thereby bringing with it
the overhead. A DBMS will use more memory and CPU than
a simple file storage system. And, of course, different types of DBMSes will require
different types and levels of system resources.
3. Compare Files and Databases, discussing pros and cons of them
File
A data file is a collection of
related records stored on a storage medium such as a hard disk or optical disc. A
Student file at a school might consist of thousands of individual student
records. Each student record in the file contains the same fields. Each field,
however, contains different data. The image shows a small sample Student
file that contains four student records, each with eleven fields. A database
includes a group of related data files.
Database
A database is a collection of
data organized in a manner that allows access, retrieval, and use of that data. Data is a collection of unprocessed items, which
can include text, numbers, images, audio, and video. Information is processed
data; that is, it is organized, meaningful, and useful.
Computers
process data in a database into information. A database at a school, for
example, contains data about students, e.g., student data, class data, etc.
A computer at the school processes new student data and then sends
advising appointment and ID card information to the printers.
4. Discuss different arrangements of data, giving examples for each
Linear
arrangement
A Linear
arrangement can be defined as a straight line arrangement typically involving
not more than two dimensions. The key factor to be noted here is that
arrangements are done only on one axis. When A is said to be on the left or
ahead of B, in a linear arrangement, it cannot be assumed that A is to the
immediate left of B or immediately ahead of B unless it is mentioned so
specifically.
The
directions given are relative in nature as it depends on from whose perspective
the test-taker is deciding the directions. For example, if four people P, Q, R,
S are sitting at a table from left to right in the same order, then Q is
sitting to the left of R but to the right of P. Change in orientation, left and
right, depends on two possible scenarios i.e. whether the test-taker assumes
people to be facing the direction he is facing or whether he assumes them to be
facing the opposite direction. But as long as consistency is maintained in
incorporating the directions, this fact should not change the solution as the two
scenarios are mirror images of each other.
Circular
arrangement
A Circular
arrangement can be defined as an arrangement having a closed loop. Typical
examples include situations wherein seating arrangements around a table have to
be made. The table can be of any shape and need not necessarily be circular.
This is illustrated by the following diagrams.
Though
the above diagrams look very different in terms of their structure, there would
be minimal deviations in the interpretation of some common clues for all these
diagrams.
For
example, A is sitting opposite to D. B is sitting to the immediate left of A. B
is sitting between A and C.
Complex
arrangement
Complex
arrangements are arrangements which involve more than two dimensions. The
approach for these problems should be very similar to that of the linear
arrangement problems except for the fact that the logical framework for
interpreting the problem assumes special significance in this case. A lot of
information needs to be comprehended in a complex arrangement problem, and
hence, care should be taken to ensure that an appropriate framework which will
aid smooth fitting and assimilation of data will be used.
5. Explain different types of databases, providing examples for
their use
1. Discuss the role of data in information systems indicating the
need for data persistence
What Is an Information System?
At the most basic level, an information system (IS) is
a set of components that work together to manage data processing and storage.
Its role is to support the key aspects of running an organization, such as
communication, record-keeping, decision making, data analysis and more.
Companies use this information to improve their business operations, make
strategic decisions and gain a competitive edge.
Information systems typically include a combination of
software, hardware and telecommunication networks. For example, an organization
may use customer relationship management systems to gain a better understanding
of its target audience, acquire new customers and retain existing clients. This
technology allows companies to gather and analyze sales activity data, define
the exact target group of a marketing campaign and measure customer
satisfaction.
00:00
The Benefits of Information Systems
Modern technology can significantly boost your
company's performance and productivity. Information systems are no exception.
Organizations worldwide rely on them to research and develop new ways to
generate revenue, engage customers and streamline time-consuming tasks.
With an information system, businesses can save time
and money while making smarter decisions. A company's internal departments,
such as marketing and sales, can communicate better and share information more
easily.
Since this technology is automated and uses complex
algorithms, it reduces human error. Furthermore, employees can focus on the
core aspects of a business rather than spending hours collecting data, filling
out paperwork and doing manual analysis.
Thanks to modern information systems, team members can
access massive amounts of data from one platform. For example, they can gather
and process information from different sources, such as vendors, customers,
warehouses and sales agents, with a few mouse clicks.
Uses and Applications
There are different types of information systems and
each has a different role. Business intelligence (BI) systems, for instance,
can turn data into valuable insights.
This kind of technology allows for faster, more
accurate reporting, better business decisions and more efficient resource
allocation. Another major benefit is data visualization, which enables analysts
to interpret large amounts of information, predict future events and find
patterns in historical data.
Organizations can also use enterprise resource
planning (ERP) software to collect, manage and analyze data across different
areas, from manufacturing to finance and accounting. This type of information
system consists of multiple applications that provide a 360-degree view of
business operations. NetSuite ERP, PeopleSoft, Odoo and Intacct are just a few
examples of ERP software.
Like other information systems, ERP provides
actionable insights and helps you decide on the next steps. It also makes it
easier to achieve regulatory compliance, increase data security and share
information between departments. Additionally, it helps to ensure that all of
your financial records are accurate and up-to-date.
In the long run, ERP software can reduce operational
costs, improve collaboration and boost your revenue. Nearly half of the
companies that implement this system report major benefits within six months.
At the end of the day, information systems can give
you a competitive advantage and provide the data you need to make faster,
smarter business decisions. Depending on your needs, you can opt for
transaction processing systems, knowledge management systems, decision support
systems and more. When choosing one, consider your budget, industry and
business size. Look for an information system that aligns with your goals and
can streamline your day-to-day operations
2. Explain the terms: Data,
Database, Database Server, and Database Management System
Data
Data (/ˈdeɪtə/ DAY-tə, /ˈdætə/ DAT-ə, /ˈdɑːtə/ DAH-tə)[1] is
a set of values of subjects with respect to qualitative or quantitative variables.
Data and information or knowledge are
often used interchangeably; however data becomes information when
it is viewed in context or in post-analysis [2].
While the concept of data is commonly associated with scientific research, data is collected by a
huge range of organizations and institutions, including businesses (e.g., sales
data, revenue, profits, stock price), governments (e.g., crime rates, unemployment
rates, literacyrates) and non-governmental organizations (e.g.,
censuses of the number of homeless
people by non-profit organizations).
Data is measured, collected and
reported, and analyzed,
whereupon it can be visualized using graphs, images or other
analysis tools. Data as a general concept refers
to the fact that some existing information or knowledge is represented or coded in
some form suitable for better usage or processing. Raw data ("unprocessed
data") is a collection of numbers or characters before it has been
"cleaned" and corrected by researchers. Raw data needs to be
corrected to remove outliers or obvious instrument or data entry errors
(e.g., a thermometer reading from an outdoor Arctic location recording a
tropical temperature). Data processing commonly occurs by stages, and the
"processed data" from one stage may be considered the "raw
data" of the next stage. Field data is
raw data that is collected in an uncontrolled "in situ"
environment. Experimental data is data that is
generated within the context of a scientific investigation by observation and
recording. Data has been described as the new oil of
the digital economy.[3][4]
Database
A database is
an organized collection of data,
generally stored and accessed electronically from a computer system. Where
databases are more complex they are often developed using formal design and
modeling techniques.
The database
management system (DBMS) is the software that
interacts with end users, applications, and the database itself to capture
and analyze the data. The DBMS software additionally encompasses the core
facilities provided to administer the database. The sum total of the database,
the DBMS and the associated applications can be referred to as a "database
system". Often the term "database" is also used to loosely refer
to any of the DBMS, the database system or an application associated with the
database.
Computer scientists
may classify database-management systems according to the database
models that they support. Relational databasesbecame dominant in the
1980s. These model data as rows and columns in
a series of tables, and the vast majority use SQL for writing and
querying data. In the 2000s, non-relational databases became popular, referred
to as NoSQL because
they use different query languages.
Formally, a
"database" refers to a set of related data and the way it is
organized. Access to this data is usually provided by a "database
management system" (DBMS) consisting of an integrated set of computer
software that allows users to
interact with one or more databases and provides access to all of the data
contained in the database (although restrictions may exist that limit access to
particular data). The DBMS provides various functions that allow entry, storage
and retrieval of large quantities of information and provides ways to manage
how that information is organized.
Because of the close
relationship between them, the term "database" is often used casually
to refer to both a database and the DBMS used to manipulate it.
Outside the world of
professional information technology, the term database is
often used to refer to any collection of related data (such as a spreadsheet or
a card index) as size and usage requirements typically necessitate use of a
database management system.
Existing DBMSs provide
various functions that allow management of a database and its data which can be
classified into four main functional groups:
· Data definition – Creation,
modification and removal of definitions that define the organization of the
data.
· Update – Insertion,
modification, and deletion of the actual data.
· Retrieval – Providing
information in a form directly usable or for further processing by other
applications. The retrieved data may be made available in a form basically the
same as it is stored in the database or in a new form obtained by altering or
combining existing data from the database.
· Administration – Registering
and monitoring users, enforcing data security, monitoring performance,
maintaining data integrity, dealing with concurrency control, and recovering
information that has been corrupted by some event such as an unexpected system
failure.[4]
Both a database and
its DBMS conform to the principles of a particular database
model."Database system" refers collectively to the
database model, database management system, and database.
Physically,
database servers are dedicated computers that hold
the actual databases and run only the DBMS and related software. Database
servers are usually multiprocessorcomputers, with generous memory
and RAID disk arrays used
for stable storage. RAID is used for recovery of data if any of the disks fail.
Hardware database accelerators, connected to one or more servers via a
high-speed channel, are also used in large volume transaction processing
environments. DBMSs are found at the heart of most database applications. DBMSs may be built
around a custom multitasking kernel with built-in networking support,
but modern DBMSs typically rely on a standard operating
system to provide these functions.
Since DBMSs comprise a
significant market, computer and storage vendors often take
into account DBMS requirements in their own development plans.[7]
Databases and DBMSs
can be categorized according to the database model(s) that they support (such
as relational or XML), the type(s) of computer they run on (from a server
cluster to a mobile phone), the query
language(s) used to access the database (such as SQL or XQuery),
and their internal engineering, which affects performance, scalability,
resilience, and security.
Database
Server
A database
server is a server which houses a database application that
provides database services
to other computer programs or to computers,
as defined by the client–servermodel.[citation needed][1][2] Database management systems (DBMSs)
frequently provide database-server functionality, and some database management
systems (such as MySQL) rely exclusively on the client–server model for
database access (while others e.g. SQLite are
meant for using as an embedded
database).
Users access a
database server either through a "front end" running on the user's
computer – which displays requested data – or through the "back end", which runs on the server and
handles tasks such as data analysis and storage.
In a master-slave model, database master
servers are central and primary locations of data while database slave servers
are synchronized backups of the master acting as proxies.
Most database
applications respond to a query
language. Each database understands its query language and converts
each submitted query to server-readable form and
executes it to retrieve results.
Examples of
proprietary database applications include Oracle, DB2, Informix,
and Microsoft SQL Server. Examples of free software database
applications include PostgreSQL; and under the GNU General Public Licence include Ingres and MySQL. Every server uses
its own query logic and structure. The SQL (Structured Query
Language) query language is more or less the same on all relational database applications.
For clarification, a
database server is simply a server that maintains services related to clients
via database applications.
Database
Management System
A database management
system (DBMS) is system software for creating and managing databases.
The DBMS provides users and programmers with a systematic way to create,
retrieve, update and manage data.
A DBMS makes it
possible for end users to create, read, update and delete data in
a database. The DBMS essentially serves as an interface between the database and
end users or application
programs, ensuring that data is consistently organized and remains
easily accessible.
The DBMS manages three
important things: the data, the database engine that allows
data to be accessed, locked and modified -- and the database schema,
which defines the database’s logical structure. These three foundational
elements help provide concurrency,
security, data
integrity and uniform administration procedures. Typical
database administration tasks supported by the DBMS include change
management, performance monitoring/tuning and backup and recovery.
Many database management systems are also responsible for automated rollbacks,
restarts and recovery as well as the logging and auditing of
activity.
The DBMS is perhaps
most useful for providing a centralized view of data that can be accessed by
multiple users, from multiple locations, in a controlled manner. A DBMS can
limit what data the end user sees, as well as how that end user can view the
data, providing many views of a single database schema. End users and software
programs are free from having to understand where the data is physically
located or on what type of storage media it resides because the DBMS handles
all requests.
The DBMS can offer
both logical and physical data independence. That means it can protect users
and applications from needing to know where data is stored or having to be
concerned about changes to the physical structure of data (storage and
hardware). As long as programs use the application programming interface (API)
for the database that is provided by the DBMS, developers won't have to modify
programs just because changes have been made to the database.
With relational DBMSs
(RDBMSs),
this API is SQL,
a standard programming language for defining, protecting and accessing data in
a RDBMS.
Popular types of DBMSes
Popular database
models and their management systems include:
Relational database
management system (RDMS) - adaptable to most use cases, but RDBMS Tier-1 products
can be quite expensive.
NoSQL DBMS -
well-suited for loosely defined data structures that may evolve over
time.
In-memory database
management system (IMDBMS) - provides faster response times and better
performance.
Columnar database
management system (CDBMS) - well-suited for data
warehouses that have a large number of similar data items.
Cloud-based data
management system - the cloud service provider
is responsible for providing and maintaining the DBMS.
Advantages of a DBMS
Using a DBMS to store
and manage data comes with advantages, but also overhead. One of the biggest
advantages of using a DBMS is that it lets end users and application
programmers access and use the same data while managing data integrity. Data is
better protected and maintained when it can be shared using a DBMS instead of
creating new iterations of the same data stored in new files for every new
application. The DBMS provides a central store of data that can be accessed by
multiple users in a controlled manner.
Central storage and
management of data within the DBMS provides:
· Data abstraction and
independence
· Data security
· A locking mechanism
for concurrent access
· An efficient handler
to balance the needs of multiple applications using the same data
· The ability to swiftly
recover from crashes and errors, including restartability and recoverability
· Robust data integrity
capabilities
· Logging and auditing
of activity
· Simple access using a
standard application programming interface (API)
· Uniform administration
procedures for data
Another advantage of a
DBMS is that it can be used to impose a logical, structured organization on the
data. A DBMS delivers economy of scale for processing large amounts of data
because it is optimized for such operations.
A DBMS can also
provide many views of a single database schema. A view defines what data the
user sees and how that user sees the data. The DBMS provides a level of
abstraction between the conceptual schema that defines the logical structure of
the database and the physical schema that describes the files, indexes and
other physical mechanisms used by the database. When a DBMS is used, systems
can be modified much more easily when business requirements change. New
categories of data can be added to the database without disrupting the existing
system and applications can be insulated from how data is structured and
stored.
Of course, a DBMS must
perform additional work to provide these advantages, thereby bringing with it
the overhead. A DBMS will use more memory and CPU than
a simple file storage system. And, of course, different types of DBMSes will require
different types and levels of system resources.
3. Compare Files and Databases, discussing pros and cons of them
File
A data file is a collection of
related records stored on a storage medium such as a hard disk or optical disc. A
Student file at a school might consist of thousands of individual student
records. Each student record in the file contains the same fields. Each field,
however, contains different data. The image shows a small sample Student
file that contains four student records, each with eleven fields. A database
includes a group of related data files.
Database
A database is a collection of
data organized in a manner that allows access, retrieval, and use of that data. Data is a collection of unprocessed items, which
can include text, numbers, images, audio, and video. Information is processed
data; that is, it is organized, meaningful, and useful.
Computers
process data in a database into information. A database at a school, for
example, contains data about students, e.g., student data, class data, etc.
A computer at the school processes new student data and then sends
advising appointment and ID card information to the printers.
4. Discuss different arrangements of data, giving examples for each
Linear
arrangement
A Linear
arrangement can be defined as a straight line arrangement typically involving
not more than two dimensions. The key factor to be noted here is that
arrangements are done only on one axis. When A is said to be on the left or
ahead of B, in a linear arrangement, it cannot be assumed that A is to the
immediate left of B or immediately ahead of B unless it is mentioned so
specifically.
The
directions given are relative in nature as it depends on from whose perspective
the test-taker is deciding the directions. For example, if four people P, Q, R,
S are sitting at a table from left to right in the same order, then Q is
sitting to the left of R but to the right of P. Change in orientation, left and
right, depends on two possible scenarios i.e. whether the test-taker assumes
people to be facing the direction he is facing or whether he assumes them to be
facing the opposite direction. But as long as consistency is maintained in
incorporating the directions, this fact should not change the solution as the two
scenarios are mirror images of each other.
Circular
arrangement
A Circular
arrangement can be defined as an arrangement having a closed loop. Typical
examples include situations wherein seating arrangements around a table have to
be made. The table can be of any shape and need not necessarily be circular.
This is illustrated by the following diagrams.
Though
the above diagrams look very different in terms of their structure, there would
be minimal deviations in the interpretation of some common clues for all these
diagrams.
For
example, A is sitting opposite to D. B is sitting to the immediate left of A. B
is sitting between A and C.
Complex
arrangement
Complex
arrangements are arrangements which involve more than two dimensions. The
approach for these problems should be very similar to that of the linear
arrangement problems except for the fact that the logical framework for
interpreting the problem assumes special significance in this case. A lot of
information needs to be comprehended in a complex arrangement problem, and
hence, care should be taken to ensure that an appropriate framework which will
aid smooth fitting and assimilation of data will be used.
5. Explain different types of databases, providing examples for
their use
1.
Centralised Database
The
information(data) is stored at a centralized location and the users from
different locations can access this data. This type of database contains
application procedures that help the users to access the data even from a
remote location.
Various
kinds of authentication procedures are applied for the verification and
validation of end users, likewise, a registration number is provided by the
application procedures which keeps a track and record of data usage. The local
area office handles this thing.
The
information(data) is stored at a centralized location and the users from
different locations can access this data. This type of database contains
application procedures that help the users to access the data even from a
remote location.
Various
kinds of authentication procedures are applied for the verification and
validation of end users, likewise, a registration number is provided by the
application procedures which keeps a track and record of data usage. The local
area office handles this thing.
2.Distributed Database
Just
opposite of the centralized database concept, the distributed database has
contributions from the common database as well as the information captured by
local computers also. The data is not at one place and is distributed at
various sites of an organization. These sites are connected to each other with
the help of communication links which helps them to access the distributed data
easily.
You can
imagine a distributed database as a one in which various portions of a database
are stored in multiple different locations(physical) along with the application
procedures which are replicated and distributed among various points in a
network.
There are
two kinds of distributed database, viz. homogenous and heterogeneous. The
databases which have same underlying hardware and run over same operating
systems and application procedures are known as homogeneous DDB, for eg. All
physical locations in a DDB. Whereas, the operating systems, underlying
hardware as well as application procedures can be different at various sites of
a DDB which is known as heterogeneous DDB.
·
Just
opposite of the centralized database concept, the distributed database has
contributions from the common database as well as the information captured by
local computers also. The data is not at one place and is distributed at
various sites of an organization. These sites are connected to each other with
the help of communication links which helps them to access the distributed data
easily.
You can
imagine a distributed database as a one in which various portions of a database
are stored in multiple different locations(physical) along with the application
procedures which are replicated and distributed among various points in a
network.
There are
two kinds of distributed database, viz. homogenous and heterogeneous. The
databases which have same underlying hardware and run over same operating
systems and application procedures are known as homogeneous DDB, for eg. All
physical locations in a DDB. Whereas, the operating systems, underlying
hardware as well as application procedures can be different at various sites of
a DDB which is known as heterogeneous DDB.
·
3.Personal Database
Data is
collected and stored on personal computers which is small and easily
manageable. The data is generally used by the same department of an
organization and is accessed by a small group of people.
·
Data is
collected and stored on personal computers which is small and easily
manageable. The data is generally used by the same department of an
organization and is accessed by a small group of people.
·
4.End User Database
The end
user is usually not concerned about the transaction or operations done at
various levels and is only aware of the product which may be a software or an
application. Therefore, this is a shared database which is specifically
designed for the end user, just like different levels’ managers. Summary of
whole information is collected in this database.
·
The end
user is usually not concerned about the transaction or operations done at
various levels and is only aware of the product which may be a software or an
application. Therefore, this is a shared database which is specifically
designed for the end user, just like different levels’ managers. Summary of
whole information is collected in this database.
·
5.Commercial Database
These are
the paid versions of the huge databases designed uniquely for the users who
want to access the information for help. These databases are subject specific,
and one cannot afford to maintain such a huge information. Access to such
databases is provided through commercial links.
These are
the paid versions of the huge databases designed uniquely for the users who
want to access the information for help. These databases are subject specific,
and one cannot afford to maintain such a huge information. Access to such
databases is provided through commercial links.
6.NoSQL Database
These are
used for large sets of distributed data. There are some big data performance
issues which are effectively handled by relational databases, such kind of
issues are easily managed by NoSQL databases. There are very efficient in
analyzing large size unstructured data that may be stored at multiple virtual
servers of the cloud.
These are
used for large sets of distributed data. There are some big data performance
issues which are effectively handled by relational databases, such kind of
issues are easily managed by NoSQL databases. There are very efficient in
analyzing large size unstructured data that may be stored at multiple virtual
servers of the cloud.
7.Operational Database
Information
related to operations of an enterprise is stored inside this database.
Functional lines like marketing, employee relations, customer service etc.
require such kind of databases.
Information
related to operations of an enterprise is stored inside this database.
Functional lines like marketing, employee relations, customer service etc.
require such kind of databases.
8.Relational
Databases
These
databases are categorized by a set of tables where data gets fit into a
pre-defined category. The table consists of rows and columns where the column
has an entry for data for a specific category and rows contains instance for
that data defined according to the category. The Structured Query Language
(SQL) is the standard user and application program interface for a relational
database.
There are
various simple operations that can be applied over the table which makes these
databases easier to extend, join two databases with a common relation and
modify all existing applications.
These
databases are categorized by a set of tables where data gets fit into a
pre-defined category. The table consists of rows and columns where the column
has an entry for data for a specific category and rows contains instance for
that data defined according to the category. The Structured Query Language
(SQL) is the standard user and application program interface for a relational
database.
There are
various simple operations that can be applied over the table which makes these
databases easier to extend, join two databases with a common relation and
modify all existing applications.
9.Cloud Databases
Now a
day, data has been specifically getting stored over clouds also known as a
virtual environment, either in a hybrid cloud, public or private cloud. A cloud
database is a database that has been optimized or built for such a virtualized
environment. There are various benefits of a cloud database, some of which are
the ability to pay for storage capacity and bandwidth on a per-user basis, and
they provide scalability on demand, along with high availability.
Now a
day, data has been specifically getting stored over clouds also known as a
virtual environment, either in a hybrid cloud, public or private cloud. A cloud
database is a database that has been optimized or built for such a virtualized
environment. There are various benefits of a cloud database, some of which are
the ability to pay for storage capacity and bandwidth on a per-user basis, and
they provide scalability on demand, along with high availability.
10.Object-Oriented
Databases
An
object-oriented database is a collection of object-oriented programming and
relational database. There are various items which are created using
object-oriented programming languages like C++, Java which can be stored in
relational databases, but object-oriented databases are well-suited for those
items.
An
object-oriented database is organized around objects rather than actions, and
data rather than logic. For example, a multimedia record in a relational
database can be a definable data object, as opposed to an alphanumeric value.
An
object-oriented database is a collection of object-oriented programming and
relational database. There are various items which are created using
object-oriented programming languages like C++, Java which can be stored in
relational databases, but object-oriented databases are well-suited for those
items.
An
object-oriented database is organized around objects rather than actions, and
data rather than logic. For example, a multimedia record in a relational
database can be a definable data object, as opposed to an alphanumeric value.
11.Graph Databases
The graph
is a collection of nodes and edges where each node is used to represent an
entity and each edge describes the relationship between entities. A
graph-oriented database, or graph database, is a type of NoSQL database that
uses graph theory to store, map and query relationships.
Graph
databases are basically used for analyzing interconnections. For example,
companies might use a graph database to mine data about customers from social
media.
6. Compare and contrast data warehouse with Big data
BASIS FOR COMPARISON
DATA WAREHOUSE
BIG DATA
Meaning
Data Warehouse is mainly an architecture, not a technology. It
extracting data from varieties SQL based data source (mainly relational
database) and help for generating analytic reports. In terms of definition,
data repository, which using for any analytic reports, has been generated
from one process, which is nothing but the data warehouse.
Big Data is mainly a technology, which stands
on volume, velocity, and variety of the data.
Volumes define the amount of data coming
from different sources, velocity refers to the
speed of data processing, and varieties refer to
the number of types of data (mainly support a
ll type of data format).
Preferences
If an organization wants to know some informed decision (like what is
going on in their corporation, next year planning based on current year
performance data etc), they prefer to choose data warehousing, as for this
kind of report they need reliable or believable data from the sources.
If organization need to compare with a lot of
big data, which contain valuable information
and help them to take a better decision
(like how to lead more revenue, more
profitability, more customers etc), they obviously preferred Big Data
approach.
Accepted Data Source
Accepted one or more homogeneous (all sites use the same DBMS product)
or heterogeneous (sites may run different DBMS product) data sources.
Accepted any kind of sources, including
business transactions, social media, and
information from sensor or machine specific
data. It can come from DBMS product or not.
Accepted type of formats
Handles mainly structural data (specifically relational data).
Accepted all types of formats. Structure data,
relational data, and unstructured data
including text documents, email, video, audio,
stock ticker data and financial transaction.
Subject-Oriented
Data warehouse is subject oriented because it actually provides
information on the specific subject (like a product, customers, suppliers,
sales, revenue etc) not on organization ongoing operation. It does not focus
on ongoing operation, it mainly focuses on analysis or displaying data which
help on decision making.
Big Data is also subject-oriented, the main
difference is a source of data, as big data can
accept and process data from all the sources
including social media, sensor or machine
specific data. It also main on provide exact
analysis on data specifically on subject
oriented.
Time-Variant
The data collected in a data warehouse is actually identified by a
particular time period. As it mainly holds historical data for an analytical
report.
Big Data have a lot of approach to identified
already loaded data, a time period is one of
the approaches on it. As Big data mainly
processing flat files, so archive with date and
time will be the best approach to identify
loaded data. But it have the option to work
with streaming data, so it not always holding
historical data.
Non-volatile
Previous data never erase when new data added to it. This is one of
the major features of a data warehouse. As it totally different from an
operational database, so any changes on an operational database will not
directly impact to a data warehouse.
For Big data, again previous data never erase
when new data added to it. It stored as a file
which represents a table. But here sometime
in case of streaming directly use Hive or Spark
as operation environment.
Distributed File System
Processing of huge data in Data Warehousing is really time-consuming
and sometimes it took an entire day for complete the process.
This is one of the big utility of Big Data.
HDFS (Hadoop Distributed File System) mainly
defined to load huge data in distributed
systems by using map reduce program.
7. Explain how the application components communicate with files and
databases
Application components are reusable
libraries that you can add to the applications you develop. An application
component can be a client-side library or a server runtime block. Typical
libraries might handle basic functions such as login or payments. They can also
contain various elements such as non-visual runtime objects, visual components,
integration adapters, and user interface screen packages.
Consider the example of a banking
application. The application might require an image-processing library for processing
checks, a non-visual runtime object, and an integration adapter to connect to
the banking system for verification. A developer might consider assembling
these reusable building blocks into application components, and then add them
to multiple MobileFirst projects to accelerate the development of
applications for a range of different devices.
An application component can help
simplify and speed up the delivery of high quality mobile applications across
multiple devices. An application component can also help developers in their
interactions with customers, can provide value-added services, and can help
developers understand how consumers use their mobile applications.
You can create an application component
based on a MobileFirst project. You define metadata information such
as the name of the component and its version number, and you select the project
resources that you want to include in the application component.
You can open an application component to
view its contents by using a file compression tool.
You add hooks to an application
component to facilitate automation when the component is added to a MobileFirst project. These additional hooks are
optional.
After creating an application component
and adding hooks, you must validate the component.wcp file
to ensure that it conforms to the correct syntax.
After you have created and validated
application components, you can add them to your MobileFirst projects.
You can remove application components
from a MobileFirst project if they are no longer required.
Whenever you add or remove an
application component, the existing MobileFirst project
files are backed up.
8. Differentiate the SQL
statements, Prepared statements, and Callable statements
SQL(Structured
Query Language)[5][6][7][8] is a domain-specific
language used in programming and designed for managing data
held in a relational
database management system (RDBMS), or for stream processing in
a relational
data stream management system (RDSMS). It is particularly
useful in handling structured data where
there are relations between different entities/variables of the data. SQL
offers two main advantages over older read/write APIs like ISAM or VSAM.
First, it introduced the concept of accessing many records with one single
command; and second, it eliminates the need to specify how to
reach a record, e.g. with or without an index.
Originally based
upon relational algebra and tuple relational
calculus, SQL consists of many types of statements,[9] which may be
informally classed as sublanguages,
commonly: a data query language (DQL),[a] a data definition
language (DDL),[b] a data control language (DCL),
and a data
manipulation language (DML).[c][10] The scope of SQL
includes data query, data manipulation (insert, update and delete), data
definition (schema creation
and modification), and data access control. Although SQL is often described as,
and to a great extent is, a declarative
language (4GL), it also includes procedural elements.
SQL was one of the
first commercial languages for Edgar F. Codd's relational model. The model was described in
his influential 1970 paper, "A Relational Model of Data for Large Shared
Data Banks".[11] Despite not entirely adhering
to the relational model as
described by Codd, it became the most widely used database language.[12][13]
SQL became a standard of
the American
National Standards Institute (ANSI) in 1986, and of the International
Organization for Standardization (ISO) in 1987.[14] Since then, the standard has been
revised to include a larger set of features. Despite the existence of such
standards, most SQL code is not completely portable among different database
systems without adjustments.
prepared
statement
prepared statement is
a feature used to execute the same (or similar) SQL statements repeatedly with
high efficiency.
Prepared
statements basically work like this:
1.
Prepare:
An SQL statement template is created and sent to the database. Certain values
are left unspecified, called parameters (labeled "?"). Example:
INSERT INTO MyGuests VALUES(?, ?, ?)
2.
The
database parses, compiles, and performs query optimization on the SQL statement
template, and stores the result without executing it
3.
Execute:
At a later time, the application binds the values to the parameters, and the
database executes the statement. The application may execute the statement as
many times as it wants with different values
Compared
to executing SQL statements directly, prepared statements have three main
advantages:
·
Prepared
statements reduce parsing time as the preparation on the query is done only
once (although the statement is executed multiple times)
·
Bound
parameters minimize bandwidth to the server as you need send only the
parameters each time, and not the whole query
·
Prepared
statements are very useful against SQL injections, because parameter values,
which are transmitted later using a different protocol, need not be correctly
escaped. If the original statement template is not derived from external input,
SQL injection cannot occur.
Callable
statements
CallableStatement
interface is used to call the stored procedures and functions.
We
can have business logic on the database by the use of stored procedures and
functions that will make the performance better because these are precompiled.
Suppose
you need the get the age of the employee based on the date of birth, you may
create a function that receives date as the input and returns age of the
employee as the output.
9. Argue the need for ORM,
explaining the development with and without ORM
ORM(Object
relational mapping)
Object-relational
mapping (ORM, O/RM,
and O/R mapping tool) in computer science is a programming technique
for converting data between incompatible type systems using object-oriented programming languages.
This creates, in effect, a "virtual object database" that can be used from
within the programming language. There are both free and commercial packages
available that perform object-relational mapping, although some programmers opt
to construct their own ORM tools.
In object-oriented
programming, data-management tasks act on objects that
are almost always non-scalar values.
For example, an address book entry that represents a single person along with
zero or more phone numbers and zero or more addresses. This could be modeled in
an object-oriented implementation by a "Person object"
with attributes/fields to
hold each data item that the entry comprises: the person's name, a list of
phone numbers, and a list of addresses. The list of phone numbers would itself
contain "PhoneNumber objects" and so on. The address-book entry is
treated as a single object by the programming language (it can be referenced by
a single variable containing a pointer to the object, for instance). Various
methods can be associated with the object, such as a method to return the
preferred phone number, the home address, and so on.
However, many popular
database products such as SQL database management
systems (DBMS) can only store and manipulate scalar values
such as integers and strings organized within tables. The programmer must either convert the
object values into groups of simpler values for storage in the database (and
convert them back upon retrieval), or only use simple scalar values within the
program. Object-relational mapping implements the first approach.
The heart of the
problem involves translating the logical representation of the objects into an
atomized form that is capable of being stored in the database while preserving
the properties of the objects and their relationships so that they can be
reloaded as objects when needed. If this storage and retrieval functionality is
implemented, the objects are said to be persistent.
10. Discuss the POJO, Java Beans, and JPA, indicating their
similarities and differences
POJO(plain old
java project)
n software engineering,
a Plain Old Java Object (POJO) is an ordinary Java object,
not bound by any special restriction and not requiring any class path. The term
was coined by Martin
Fowler, Rebecca Parsons and Josh MacKenzie in September 2000: [1]
"We
wondered why people were so against using regular objects in their systems and
concluded that it was because simple objects lacked a fancy name. So we gave
them one, and it's caught on very nicely."[1]
The term
"POJO" initially denoted a Java object which does not follow any of
the major Java object models, conventions, or frameworks; nowadays
"POJO" may be used as an acronym for "Plain Old JavaScript Object"
as well, in which case the term denotes a JavaScript object of similar pedigree.[2]
The term
continues the pattern of older terms for technologies that do not use fancy new
features, such as POTS (Plain Old
Telephone Service) in telephony and Pod (Plain Old
Documentation) in Perl. The equivalent to POJO
on the .NET framework is Plain Old CLR Object (POCO).[3] For PHP, it is
Plain Old PHP Object (POPO).[4][5]
The POJO
phenomenon has most likely gained widespread acceptance because of the need for
a common and easily understood term that contrasts with complicated object
frameworks
Java beans
n computing
based on the Java Platform, JavaBeans are classes that
encapsulate many objects into
a single object (the bean). They are serializable, have a zero-argument
constructor, and allow access to properties using getter and setter methods. The name
"Bean" was given to encompass this standard, which aims to
create reusable software
components for Java.
It is a reusable
software component written in Java that can be manipulated visually in an application
builder tool.
The graph
is a collection of nodes and edges where each node is used to represent an
entity and each edge describes the relationship between entities. A
graph-oriented database, or graph database, is a type of NoSQL database that
uses graph theory to store, map and query relationships.
Graph
databases are basically used for analyzing interconnections. For example,
companies might use a graph database to mine data about customers from social
media.
6. Compare and contrast data warehouse with Big data
BASIS FOR COMPARISON
|
DATA WAREHOUSE
|
BIG DATA
|
Meaning
|
Data Warehouse is mainly an architecture, not a technology. It
extracting data from varieties SQL based data source (mainly relational
database) and help for generating analytic reports. In terms of definition,
data repository, which using for any analytic reports, has been generated
from one process, which is nothing but the data warehouse.
|
Big Data is mainly a technology, which stands
on volume, velocity, and variety of the data.
Volumes define the amount of data coming
from different sources, velocity refers to the
speed of data processing, and varieties refer to
the number of types of data (mainly support a
ll type of data format).
|
Preferences
|
If an organization wants to know some informed decision (like what is
going on in their corporation, next year planning based on current year
performance data etc), they prefer to choose data warehousing, as for this
kind of report they need reliable or believable data from the sources.
|
If organization need to compare with a lot of
big data, which contain valuable information
and help them to take a better decision
(like how to lead more revenue, more
profitability, more customers etc), they obviously preferred Big Data
approach.
|
Accepted Data Source
|
Accepted one or more homogeneous (all sites use the same DBMS product)
or heterogeneous (sites may run different DBMS product) data sources.
|
Accepted any kind of sources, including
business transactions, social media, and
information from sensor or machine specific
data. It can come from DBMS product or not.
|
Accepted type of formats
|
Handles mainly structural data (specifically relational data).
|
Accepted all types of formats. Structure data,
relational data, and unstructured data
including text documents, email, video, audio,
stock ticker data and financial transaction.
|
Subject-Oriented
|
Data warehouse is subject oriented because it actually provides
information on the specific subject (like a product, customers, suppliers,
sales, revenue etc) not on organization ongoing operation. It does not focus
on ongoing operation, it mainly focuses on analysis or displaying data which
help on decision making.
|
Big Data is also subject-oriented, the main
difference is a source of data, as big data can
accept and process data from all the sources
including social media, sensor or machine
specific data. It also main on provide exact
analysis on data specifically on subject
oriented.
|
Time-Variant
|
The data collected in a data warehouse is actually identified by a
particular time period. As it mainly holds historical data for an analytical
report.
|
Big Data have a lot of approach to identified
already loaded data, a time period is one of
the approaches on it. As Big data mainly
processing flat files, so archive with date and
time will be the best approach to identify
loaded data. But it have the option to work
with streaming data, so it not always holding
historical data.
|
Non-volatile
|
Previous data never erase when new data added to it. This is one of
the major features of a data warehouse. As it totally different from an
operational database, so any changes on an operational database will not
directly impact to a data warehouse.
|
For Big data, again previous data never erase
when new data added to it. It stored as a file
which represents a table. But here sometime
in case of streaming directly use Hive or Spark
as operation environment.
|
Distributed File System
|
Processing of huge data in Data Warehousing is really time-consuming
and sometimes it took an entire day for complete the process.
|
This is one of the big utility of Big Data.
HDFS (Hadoop Distributed File System) mainly
defined to load huge data in distributed
systems by using map reduce program.
|
7. Explain how the application components communicate with files and
databases
Application components are reusable
libraries that you can add to the applications you develop. An application
component can be a client-side library or a server runtime block. Typical
libraries might handle basic functions such as login or payments. They can also
contain various elements such as non-visual runtime objects, visual components,
integration adapters, and user interface screen packages.
Consider the example of a banking
application. The application might require an image-processing library for processing
checks, a non-visual runtime object, and an integration adapter to connect to
the banking system for verification. A developer might consider assembling
these reusable building blocks into application components, and then add them
to multiple MobileFirst projects to accelerate the development of
applications for a range of different devices.
An application component can help
simplify and speed up the delivery of high quality mobile applications across
multiple devices. An application component can also help developers in their
interactions with customers, can provide value-added services, and can help
developers understand how consumers use their mobile applications.
You can create an application component
based on a MobileFirst project. You define metadata information such
as the name of the component and its version number, and you select the project
resources that you want to include in the application component.
You can open an application component to
view its contents by using a file compression tool.
You add hooks to an application
component to facilitate automation when the component is added to a MobileFirst project. These additional hooks are
optional.
After creating an application component
and adding hooks, you must validate the component.wcp file
to ensure that it conforms to the correct syntax.
After you have created and validated
application components, you can add them to your MobileFirst projects.
You can remove application components
from a MobileFirst project if they are no longer required.
Whenever you add or remove an
application component, the existing MobileFirst project
files are backed up.
8. Differentiate the SQL
statements, Prepared statements, and Callable statements
SQL(Structured
Query Language)[5][6][7][8] is a domain-specific
language used in programming and designed for managing data
held in a relational
database management system (RDBMS), or for stream processing in
a relational
data stream management system (RDSMS). It is particularly
useful in handling structured data where
there are relations between different entities/variables of the data. SQL
offers two main advantages over older read/write APIs like ISAM or VSAM.
First, it introduced the concept of accessing many records with one single
command; and second, it eliminates the need to specify how to
reach a record, e.g. with or without an index.
Originally based
upon relational algebra and tuple relational
calculus, SQL consists of many types of statements,[9] which may be
informally classed as sublanguages,
commonly: a data query language (DQL),[a] a data definition
language (DDL),[b] a data control language (DCL),
and a data
manipulation language (DML).[c][10] The scope of SQL
includes data query, data manipulation (insert, update and delete), data
definition (schema creation
and modification), and data access control. Although SQL is often described as,
and to a great extent is, a declarative
language (4GL), it also includes procedural elements.
SQL was one of the
first commercial languages for Edgar F. Codd's relational model. The model was described in
his influential 1970 paper, "A Relational Model of Data for Large Shared
Data Banks".[11] Despite not entirely adhering
to the relational model as
described by Codd, it became the most widely used database language.[12][13]
SQL became a standard of
the American
National Standards Institute (ANSI) in 1986, and of the International
Organization for Standardization (ISO) in 1987.[14] Since then, the standard has been
revised to include a larger set of features. Despite the existence of such
standards, most SQL code is not completely portable among different database
systems without adjustments.
prepared
statement
prepared statement is
a feature used to execute the same (or similar) SQL statements repeatedly with
high efficiency.
Prepared
statements basically work like this:
1.
Prepare:
An SQL statement template is created and sent to the database. Certain values
are left unspecified, called parameters (labeled "?"). Example:
INSERT INTO MyGuests VALUES(?, ?, ?)
2.
The
database parses, compiles, and performs query optimization on the SQL statement
template, and stores the result without executing it
3.
Execute:
At a later time, the application binds the values to the parameters, and the
database executes the statement. The application may execute the statement as
many times as it wants with different values
Compared
to executing SQL statements directly, prepared statements have three main
advantages:
·
Prepared
statements reduce parsing time as the preparation on the query is done only
once (although the statement is executed multiple times)
·
Bound
parameters minimize bandwidth to the server as you need send only the
parameters each time, and not the whole query
·
Prepared
statements are very useful against SQL injections, because parameter values,
which are transmitted later using a different protocol, need not be correctly
escaped. If the original statement template is not derived from external input,
SQL injection cannot occur.
Callable
statements
CallableStatement
interface is used to call the stored procedures and functions.
We
can have business logic on the database by the use of stored procedures and
functions that will make the performance better because these are precompiled.
Suppose
you need the get the age of the employee based on the date of birth, you may
create a function that receives date as the input and returns age of the
employee as the output.
9. Argue the need for ORM,
explaining the development with and without ORM
ORM(Object
relational mapping)
Object-relational
mapping (ORM, O/RM,
and O/R mapping tool) in computer science is a programming technique
for converting data between incompatible type systems using object-oriented programming languages.
This creates, in effect, a "virtual object database" that can be used from
within the programming language. There are both free and commercial packages
available that perform object-relational mapping, although some programmers opt
to construct their own ORM tools.
In object-oriented
programming, data-management tasks act on objects that
are almost always non-scalar values.
For example, an address book entry that represents a single person along with
zero or more phone numbers and zero or more addresses. This could be modeled in
an object-oriented implementation by a "Person object"
with attributes/fields to
hold each data item that the entry comprises: the person's name, a list of
phone numbers, and a list of addresses. The list of phone numbers would itself
contain "PhoneNumber objects" and so on. The address-book entry is
treated as a single object by the programming language (it can be referenced by
a single variable containing a pointer to the object, for instance). Various
methods can be associated with the object, such as a method to return the
preferred phone number, the home address, and so on.
However, many popular
database products such as SQL database management
systems (DBMS) can only store and manipulate scalar values
such as integers and strings organized within tables. The programmer must either convert the
object values into groups of simpler values for storage in the database (and
convert them back upon retrieval), or only use simple scalar values within the
program. Object-relational mapping implements the first approach.
The heart of the
problem involves translating the logical representation of the objects into an
atomized form that is capable of being stored in the database while preserving
the properties of the objects and their relationships so that they can be
reloaded as objects when needed. If this storage and retrieval functionality is
implemented, the objects are said to be persistent.
10. Discuss the POJO, Java Beans, and JPA, indicating their
similarities and differences
POJO(plain old
java project)
n software engineering,
a Plain Old Java Object (POJO) is an ordinary Java object,
not bound by any special restriction and not requiring any class path. The term
was coined by Martin
Fowler, Rebecca Parsons and Josh MacKenzie in September 2000: [1]
"We
wondered why people were so against using regular objects in their systems and
concluded that it was because simple objects lacked a fancy name. So we gave
them one, and it's caught on very nicely."[1]
The term
"POJO" initially denoted a Java object which does not follow any of
the major Java object models, conventions, or frameworks; nowadays
"POJO" may be used as an acronym for "Plain Old JavaScript Object"
as well, in which case the term denotes a JavaScript object of similar pedigree.[2]
The term
continues the pattern of older terms for technologies that do not use fancy new
features, such as POTS (Plain Old
Telephone Service) in telephony and Pod (Plain Old
Documentation) in Perl. The equivalent to POJO
on the .NET framework is Plain Old CLR Object (POCO).[3] For PHP, it is
Plain Old PHP Object (POPO).[4][5]
The POJO
phenomenon has most likely gained widespread acceptance because of the need for
a common and easily understood term that contrasts with complicated object
frameworks
Java beans
n computing
based on the Java Platform, JavaBeans are classes that
encapsulate many objects into
a single object (the bean). They are serializable, have a zero-argument
constructor, and allow access to properties using getter and setter methods. The name
"Bean" was given to encompass this standard, which aims to
create reusable software
components for Java.
It is a reusable
software component written in Java that can be manipulated visually in an application
builder tool.
Features
· Introspection
Introspection is
a process of analyzing a Bean to determine its capabilities. This is an
essential feature of the Java Beans API because it allows another application
such as a design tool, to obtain information about a component.
· Properties
A property is a
subset of a Bean's state. The values assigned to the properties determine the
behaviour and appearance of that component. It is set through setter method and
can be obtained by getter method.
· Customization
A customizer can
provide a step-by-step guide that the process must be followed to use the
component in a specific context.
· Events
· Persistence
It is the
ability to save the current state of a Bean, including the values of a Bean's
properties and instance variables, to nonvolatile storage and to retrieve them
at a later time.
· Methods
· Introspection
Introspection is
a process of analyzing a Bean to determine its capabilities. This is an
essential feature of the Java Beans API because it allows another application
such as a design tool, to obtain information about a component.
· Properties
A property is a
subset of a Bean's state. The values assigned to the properties determine the
behaviour and appearance of that component. It is set through setter method and
can be obtained by getter method.
· Customization
A customizer can
provide a step-by-step guide that the process must be followed to use the
component in a specific context.
· Events
· Persistence
It is the
ability to save the current state of a Bean, including the values of a Bean's
properties and instance variables, to nonvolatile storage and to retrieve them
at a later time.
· Methods
Advantages
· The properties, events,
and methods of a bean can be exposed to another application.
· A bean may register to
receive events from other objects and can generate events that are sent to
those other objects.
· Auxiliary software can
be provided to help configure a bean.
· The configuration
settings of a bean can be saved to persistent storage and restored.
· The properties, events,
and methods of a bean can be exposed to another application.
· A bean may register to
receive events from other objects and can generate events that are sent to
those other objects.
· Auxiliary software can
be provided to help configure a bean.
· The configuration
settings of a bean can be saved to persistent storage and restored.
Disadvantages
· A class with a zero-argument
constructor is subject to being instantiated in an invalid
state.[1] If such a class is instantiated manually
by a developer (rather than automatically by some kind of framework), the
developer might not realize that the class has been improperly instantiated.
The compiler cannot detect such a problem, and even if it is documented, there
is no guarantee that the developer will see the documentation.
· Having to create
getters for every property and setters for many, most, or all of them can lead
to an immense quantity of boilerplate code.
JPA(java
persistence API)
The Java
Persistence API (JPA) is a Java application programming interface specification
that describes the management of relational data in applications
using Java Platform, Standard Edition and Java Platform, Enterprise Edition.
Persistence in this context
covers three areas:
· object/relational
metadata
· A class with a zero-argument
constructor is subject to being instantiated in an invalid
state.[1] If such a class is instantiated manually
by a developer (rather than automatically by some kind of framework), the
developer might not realize that the class has been improperly instantiated.
The compiler cannot detect such a problem, and even if it is documented, there
is no guarantee that the developer will see the documentation.
· Having to create
getters for every property and setters for many, most, or all of them can lead
to an immense quantity of boilerplate code.
JPA(java
persistence API)
The Java
Persistence API (JPA) is a Java application programming interface specification
that describes the management of relational data in applications
using Java Platform, Standard Edition and Java Platform, Enterprise Edition.
Persistence in this context
covers three areas:
· object/relational
metadata
History
The final
release date of the JPA 1.0 specification was 11 May 2006 as part of Java Community Process JSR 220. The JPA
2.0 specification was released 10 December 2009 (The Java EE 6 platform
requires JPA 2.0[1].)
The JPA 2.1 specification was released 22 April 2013 (The Java EE 7 platform
requires JPA 2.1[2].)
The final
release date of the JPA 1.0 specification was 11 May 2006 as part of Java Community Process JSR 220. The JPA
2.0 specification was released 10 December 2009 (The Java EE 6 platform
requires JPA 2.0[1].)
The JPA 2.1 specification was released 22 April 2013 (The Java EE 7 platform
requires JPA 2.1[2].)
Entities
A
persistence entity is a lightweight Java class whose
state is typically persisted to a table in
a relational database. Instances of such an
entity correspond to individual rows in
the table. Entities typically have relationships with other entities, and these
relationships are expressed through object/relational metadata.
Object/relational metadata can be specified directly in the entity class file
by using annotations, or in a separate XML descriptor file
distributed with the application.
A
persistence entity is a lightweight Java class whose
state is typically persisted to a table in
a relational database. Instances of such an
entity correspond to individual rows in
the table. Entities typically have relationships with other entities, and these
relationships are expressed through object/relational metadata.
Object/relational metadata can be specified directly in the entity class file
by using annotations, or in a separate XML descriptor file
distributed with the application.
The
Java Persistence Query Language
The Java Persistence Query Language (JPQL)
makes queries against entities stored in a relational database. Queries
resemble SQL queries
in syntax, but operate against entity objects rather than directly with
database tables.
The Java Persistence Query Language (JPQL)
makes queries against entities stored in a relational database. Queries
resemble SQL queries
in syntax, but operate against entity objects rather than directly with
database tables.
Motivation
Prior to the
introduction of EJB 3.0
specification, many enterprise Java developers used lightweight persistent
objects, provided by either persistence frameworks (for example Hibernate)
or data access objects instead of entity
beans. This is because entity beans, in previous EJB specifications, called
for too much complicated code and heavy resource footprint, and they could be
used only in Java EE application servers because of
interconnections and dependencies in the source code between beans and DAO
objects or persistence framework. Thus, many of the features originally
presented in third-party persistence frameworks were incorporated into the Java
Persistence API, and, as of 2006, projects like Hibernate (version
3.2) and TopLink
Essentials have become themselves implementations of the Java Persistence
API specification.
11. Identify the ORM tools available for different development
platforms (Java, PHP, and .Net)
Prior to the
introduction of EJB 3.0
specification, many enterprise Java developers used lightweight persistent
objects, provided by either persistence frameworks (for example Hibernate)
or data access objects instead of entity
beans. This is because entity beans, in previous EJB specifications, called
for too much complicated code and heavy resource footprint, and they could be
used only in Java EE application servers because of
interconnections and dependencies in the source code between beans and DAO
objects or persistence framework. Thus, many of the features originally
presented in third-party persistence frameworks were incorporated into the Java
Persistence API, and, as of 2006, projects like Hibernate (version
3.2) and TopLink
Essentials have become themselves implementations of the Java Persistence
API specification.
11. Identify the ORM tools available for different development
platforms (Java, PHP, and .Net)
12. Discuss the need for
NoSQL indicating the benefits, also explain different types of NoSQL databases
A NoSQL (originally referring to
"non SQL"
or "non relational")database provides a mechanism for storage and retrieval of
data that is modeled in means other than the tabular relations used in relational databases. Such databases have
existed since the late 1960s, but did not obtain the "NoSQL" moniker
until a surge of popularity in the early 21st century, triggered by the
needs of Web
2.0 companies.[3][4][5] NoSQL databases
are increasingly used in big data and real-time
web applications.[6] NoSQL systems
are also sometimes called "Not only SQL" to emphasize that they may
support SQL-like
query languages, or sit alongside SQL database in a polyglot persistence architecture.[7][8]
Motivations for this approach include: simplicity of
design, simpler "horizontal" scaling to clusters of
machines (which is a problem for relational databases),and finer control over
availability. The data structures used by NoSQL databases (e.g. key-value, wide
column, graph, or document) are different from those used by default in
relational databases, making some operations faster in NoSQL. The particular
suitability of a given NoSQL database depends on the problem it must solve.
Sometimes the data structures used by NoSQL databases are also viewed as
"more flexible" than relational database tables.
Many NoSQL stores compromise consistency (in the sense of
the CAP
theorem) in favor of availability, partition tolerance, and speed. Barriers
to the greater adoption of NoSQL stores include the use of low-level query
languages (instead of SQL, for instance the lack of ability to perform
ad-hoc joins across tables), lack of standardized
interfaces, and huge previous investments in existing relational databases.[10] Most NoSQL
stores lack true ACID transactions,
although a few databases have made them central to their designs.
Instead, most NoSQL databases offer a concept of
"eventual consistency" in which database
changes are propagated to all nodes "eventually" (typically within
milliseconds) so queries for data might not return updated data immediately or
might result in reading data that is not accurate, a problem known as stale
reads.[11] Additionally,
some NoSQL systems may exhibit lost writes and other forms of data loss.[12] Some NoSQL
systems provide concepts such as write-ahead logging to avoid data loss.[13] For distributed transaction processing across
multiple databases, data consistency is an even bigger challenge that is
difficult for both NoSQL and relational databases. Even current relational
databases "do not allow referential integrity constraints to span
databases.Few systems maintain both ACID transactions
and X/Open
XA standards for distributed transaction processing.
13. Discuss what Hadoop is, explaining the core concepts of it 14.
Explain the concept of IR, identifying tools for IR
Hadoop
Apache Hadoop ( /həˈduːp/)
is a collection of open-source software utilities that
facilitate using a network of many computers to solve problems involving
massive amounts of data and computation. It provides a software framework for distributed storage and processing
of big
data using the MapReduce programming
model. Originally designed for computer
clusters built from commodity hardware[3]—still the common
use—it has also found use on clusters of higher-end hardware. All the
modules in Hadoop are designed with a fundamental assumption that hardware
failures are common occurrences and should be automatically handled by the
framework.
The core of Apache Hadoop consists of a storage part,
known as Hadoop Distributed File System (HDFS), and a processing part which is
a MapReduce programming model. Hadoop splits files into large blocks and
distributes them across nodes in a cluster. It then transfers packaged
code into nodes to process the data in parallel. This approach takes
advantage of data locality, where nodes manipulate the data
they have access to. This allows the dataset to be processed faster and more efficiently
than it would be in a more conventional supercomputer architecture that
relies on a parallel file system where computation
and data are distributed via high-speed networking.
The base Apache Hadoop framework is composed of the
following modules:
· Hadoop Common – contains
libraries and utilities needed by other Hadoop modules;
· Hadoop Distributed
File System (HDFS) –
a distributed file-system that stores data on commodity machines, providing
very high aggregate bandwidth across the cluster;
· Hadoop YARN – introduced in
2012 is a platform responsible for managing computing resources in clusters and
using them for scheduling users' applications;
· Hadoop MapReduce – an
implementation of the MapReduce programming model for large-scale data
processing.
The term Hadoop is often used for
both base modules and sub-modules and also the ecosystem, or
collection of additional software packages that can be installed on top of or
alongside Hadoop, such as Apache Pig, Apache
Hive, Apache HBase, Apache
Phoenix, Apache Spark, Apache
ZooKeeper, Cloudera Impala, Apache
Flume, Apache Sqoop, Apache
Oozie, and Apache Storm.
Apache Hadoop's MapReduce and HDFS components were
inspired by Google papers
on MapReduce and Google File System.
The Hadoop framework itself is mostly written in
the Java programming language, with some
native code in C and command line utilities written as shell
scripts. Though MapReduce Java code is common, any programming language can
be used with Hadoop Streaming to implement the map and reduce parts of the
user's program.[14] Other projects
in the Hadoop ecosystem expose richer user interfaces
Information retrivel
Information retrieval (IR) is
the activity of obtaining information system resources relevant to an
information need from a collection. Searches can be based on full-text or
other content-based indexing. Information retrieval is the science of searching
for information in a document, searching for documents themselves, and also
searching for metadata that describe data, and for databases of
texts, images or sounds.
Automated information retrieval systems are used to
reduce what has been called information overload. An IR system is a
software that provide access to books, journals and other documents, stores
them and manages the document. Web
search engines are the most visible IR applications.
An information retrieval process begins when a user
enters a query into the system. Queries are formal statements of information
needs, for example search strings in web search engines. In information
retrieval a query does not uniquely identify a single object in the collection.
Instead, several objects may match the query, perhaps with different degrees
of relevancy.
An object is an entity that is represented by
information in a content collection or database. User
queries are matched against the database information. However, as opposed to
classical SQL queries of a database, in information retrieval the results
returned may or may not match the query, so results are typically ranked. This
ranking of results is a key difference of information retrieval searching
compared to database searching.[1]
Depending on the application the data
objects may be, for example, text documents, images,[2] audio,[3] mind maps[4] or videos. Often
the documents themselves are not kept or stored directly in the IR system, but
are instead represented in the system by document surrogates or metadata.
Most IR systems compute a numeric score on how well
each object in the database matches the query, and rank the objects according
to this value. The top ranking objects are then shown to the user. The process may
then be iterated if the user wishes to refine the query.
12. Discuss the need for
NoSQL indicating the benefits, also explain different types of NoSQL databases
A NoSQL (originally referring to
"non SQL"
or "non relational")database provides a mechanism for storage and retrieval of
data that is modeled in means other than the tabular relations used in relational databases. Such databases have
existed since the late 1960s, but did not obtain the "NoSQL" moniker
until a surge of popularity in the early 21st century, triggered by the
needs of Web
2.0 companies.[3][4][5] NoSQL databases
are increasingly used in big data and real-time
web applications.[6] NoSQL systems
are also sometimes called "Not only SQL" to emphasize that they may
support SQL-like
query languages, or sit alongside SQL database in a polyglot persistence architecture.[7][8]
Motivations for this approach include: simplicity of
design, simpler "horizontal" scaling to clusters of
machines (which is a problem for relational databases),and finer control over
availability. The data structures used by NoSQL databases (e.g. key-value, wide
column, graph, or document) are different from those used by default in
relational databases, making some operations faster in NoSQL. The particular
suitability of a given NoSQL database depends on the problem it must solve.
Sometimes the data structures used by NoSQL databases are also viewed as
"more flexible" than relational database tables.
Many NoSQL stores compromise consistency (in the sense of
the CAP
theorem) in favor of availability, partition tolerance, and speed. Barriers
to the greater adoption of NoSQL stores include the use of low-level query
languages (instead of SQL, for instance the lack of ability to perform
ad-hoc joins across tables), lack of standardized
interfaces, and huge previous investments in existing relational databases.[10] Most NoSQL
stores lack true ACID transactions,
although a few databases have made them central to their designs.
Instead, most NoSQL databases offer a concept of
"eventual consistency" in which database
changes are propagated to all nodes "eventually" (typically within
milliseconds) so queries for data might not return updated data immediately or
might result in reading data that is not accurate, a problem known as stale
reads.[11] Additionally,
some NoSQL systems may exhibit lost writes and other forms of data loss.[12] Some NoSQL
systems provide concepts such as write-ahead logging to avoid data loss.[13] For distributed transaction processing across
multiple databases, data consistency is an even bigger challenge that is
difficult for both NoSQL and relational databases. Even current relational
databases "do not allow referential integrity constraints to span
databases.Few systems maintain both ACID transactions
and X/Open
XA standards for distributed transaction processing.
13. Discuss what Hadoop is, explaining the core concepts of it 14.
Explain the concept of IR, identifying tools for IR
Hadoop
Apache Hadoop ( /həˈduːp/)
is a collection of open-source software utilities that
facilitate using a network of many computers to solve problems involving
massive amounts of data and computation. It provides a software framework for distributed storage and processing
of big
data using the MapReduce programming
model. Originally designed for computer
clusters built from commodity hardware[3]—still the common
use—it has also found use on clusters of higher-end hardware. All the
modules in Hadoop are designed with a fundamental assumption that hardware
failures are common occurrences and should be automatically handled by the
framework.
The core of Apache Hadoop consists of a storage part,
known as Hadoop Distributed File System (HDFS), and a processing part which is
a MapReduce programming model. Hadoop splits files into large blocks and
distributes them across nodes in a cluster. It then transfers packaged
code into nodes to process the data in parallel. This approach takes
advantage of data locality, where nodes manipulate the data
they have access to. This allows the dataset to be processed faster and more efficiently
than it would be in a more conventional supercomputer architecture that
relies on a parallel file system where computation
and data are distributed via high-speed networking.
The base Apache Hadoop framework is composed of the
following modules:
· Hadoop Common – contains
libraries and utilities needed by other Hadoop modules;
· Hadoop Distributed
File System (HDFS) –
a distributed file-system that stores data on commodity machines, providing
very high aggregate bandwidth across the cluster;
· Hadoop YARN – introduced in
2012 is a platform responsible for managing computing resources in clusters and
using them for scheduling users' applications;
· Hadoop MapReduce – an
implementation of the MapReduce programming model for large-scale data
processing.
The term Hadoop is often used for
both base modules and sub-modules and also the ecosystem, or
collection of additional software packages that can be installed on top of or
alongside Hadoop, such as Apache Pig, Apache
Hive, Apache HBase, Apache
Phoenix, Apache Spark, Apache
ZooKeeper, Cloudera Impala, Apache
Flume, Apache Sqoop, Apache
Oozie, and Apache Storm.
Apache Hadoop's MapReduce and HDFS components were
inspired by Google papers
on MapReduce and Google File System.
The Hadoop framework itself is mostly written in
the Java programming language, with some
native code in C and command line utilities written as shell
scripts. Though MapReduce Java code is common, any programming language can
be used with Hadoop Streaming to implement the map and reduce parts of the
user's program.[14] Other projects
in the Hadoop ecosystem expose richer user interfaces
Information retrivel
Information retrieval (IR) is
the activity of obtaining information system resources relevant to an
information need from a collection. Searches can be based on full-text or
other content-based indexing. Information retrieval is the science of searching
for information in a document, searching for documents themselves, and also
searching for metadata that describe data, and for databases of
texts, images or sounds.
Automated information retrieval systems are used to
reduce what has been called information overload. An IR system is a
software that provide access to books, journals and other documents, stores
them and manages the document. Web
search engines are the most visible IR applications.
An information retrieval process begins when a user
enters a query into the system. Queries are formal statements of information
needs, for example search strings in web search engines. In information
retrieval a query does not uniquely identify a single object in the collection.
Instead, several objects may match the query, perhaps with different degrees
of relevancy.
An object is an entity that is represented by
information in a content collection or database. User
queries are matched against the database information. However, as opposed to
classical SQL queries of a database, in information retrieval the results
returned may or may not match the query, so results are typically ranked. This
ranking of results is a key difference of information retrieval searching
compared to database searching.[1]
Depending on the application the data
objects may be, for example, text documents, images,[2] audio,[3] mind maps[4] or videos. Often
the documents themselves are not kept or stored directly in the IR system, but
are instead represented in the system by document surrogates or metadata.
Most IR systems compute a numeric score on how well
each object in the database matches the query, and rank the objects according
to this value. The top ranking objects are then shown to the user. The process may
then be iterated if the user wishes to refine the query.
Comments
Post a Comment