tute 8

1. Discuss the role of data in information systems indicating the need for data persistence

What Is an Information System?

At the most basic level, an information system (IS) is a set of components that work together to manage data processing and storage. Its role is to support the key aspects of running an organization, such as communication, record-keeping, decision making, data analysis and more. Companies use this information to improve their business operations, make strategic decisions and gain a competitive edge.

Information systems typically include a combination of software, hardware and telecommunication networks. For example, an organization may use customer relationship management systems to gain a better understanding of its target audience, acquire new customers and retain existing clients. This technology allows companies to gather and analyze sales activity data, define the exact target group of a marketing campaign and measure customer satisfaction.

00:00

The Benefits of Information Systems

Modern technology can significantly boost your company's performance and productivity. Information systems are no exception. Organizations worldwide rely on them to research and develop new ways to generate revenue, engage customers and streamline time-consuming tasks.

With an information system, businesses can save time and money while making smarter decisions. A company's internal departments, such as marketing and sales, can communicate better and share information more easily.

Since this technology is automated and uses complex algorithms, it reduces human error. Furthermore, employees can focus on the core aspects of a business rather than spending hours collecting data, filling out paperwork and doing manual analysis.

Thanks to modern information systems, team members can access massive amounts of data from one platform. For example, they can gather and process information from different sources, such as vendors, customers, warehouses and sales agents, with a few mouse clicks.

Uses and Applications

There are different types of information systems and each has a different role. Business intelligence (BI) systems, for instance, can turn data into valuable insights.

This kind of technology allows for faster, more accurate reporting, better business decisions and more efficient resource allocation. Another major benefit is data visualization, which enables analysts to interpret large amounts of information, predict future events and find patterns in historical data.

Organizations can also use enterprise resource planning (ERP) software to collect, manage and analyze data across different areas, from manufacturing to finance and accounting. This type of information system consists of multiple applications that provide a 360-degree view of business operations. NetSuite ERP, PeopleSoft, Odoo and Intacct are just a few examples of ERP software.

Like other information systems, ERP provides actionable insights and helps you decide on the next steps. It also makes it easier to achieve regulatory compliance, increase data security and share information between departments. Additionally, it helps to ensure that all of your financial records are accurate and up-to-date.

In the long run, ERP software can reduce operational costs, improve collaboration and boost your revenue. Nearly half of the companies that implement this system report major benefits within six months.

At the end of the day, information systems can give you a competitive advantage and provide the data you need to make faster, smarter business decisions. Depending on your needs, you can opt for transaction processing systems, knowledge management systems, decision support systems and more. When choosing one, consider your budget, industry and business size. Look for an information system that aligns with your goals and can streamline your day-to-day operations

2. Explain the terms: Data, Database, Database Server, and Database Management System

Data

Data (/ˈdeɪtə/ DAY-tə, /ˈdætə/ DAT-ə, /ˈdɑːtə/ DAH-tə)^[1] is a set of values of subjects with respect to qualitative or quantitative variables.

Data and information or knowledge are often used interchangeably; however data becomes information when it is viewed in context or in post-analysis ^[2]. While the concept of data is commonly associated with scientific research, data is collected by a huge range of organizations and institutions, including businesses (e.g., sales data, revenue, profits, stock price), governments (e.g., crime rates, unemployment rates, literacyrates) and non-governmental organizations (e.g., censuses of the number of homeless people by non-profit organizations).

Data is measured, collected and reported, and analyzed, whereupon it can be visualized using graphs, images or other analysis tools. Data as a general concept refers to the fact that some existing information or knowledge is represented or coded in some form suitable for better usage or processing. Raw data ("unprocessed data") is a collection of numbers or characters before it has been "cleaned" and corrected by researchers. Raw data needs to be corrected to remove outliers or obvious instrument or data entry errors (e.g., a thermometer reading from an outdoor Arctic location recording a tropical temperature). Data processing commonly occurs by stages, and the "processed data" from one stage may be considered the "raw data" of the next stage. Field data is raw data that is collected in an uncontrolled "in situ" environment. Experimental data is data that is generated within the context of a scientific investigation by observation and recording. Data has been described as the new oil of the digital economy.^[3]^[4]

^Database

A database is an organized collection of data, generally stored and accessed electronically from a computer system. Where databases are more complex they are often developed using formal design and modeling techniques.

The database management system (DBMS) is the software that interacts with end users, applications, and the database itself to capture and analyze the data. The DBMS software additionally encompasses the core facilities provided to administer the database. The sum total of the database, the DBMS and the associated applications can be referred to as a "database system". Often the term "database" is also used to loosely refer to any of the DBMS, the database system or an application associated with the database.

Computer scientists may classify database-management systems according to the database models that they support. Relational databasesbecame dominant in the 1980s. These model data as rows and columns in a series of tables, and the vast majority use SQL for writing and querying data. In the 2000s, non-relational databases became popular, referred to as NoSQL because they use different query languages.

Formally, a "database" refers to a set of related data and the way it is organized. Access to this data is usually provided by a "database management system" (DBMS) consisting of an integrated set of computer software that allows users to interact with one or more databases and provides access to all of the data contained in the database (although restrictions may exist that limit access to particular data). The DBMS provides various functions that allow entry, storage and retrieval of large quantities of information and provides ways to manage how that information is organized.

Because of the close relationship between them, the term "database" is often used casually to refer to both a database and the DBMS used to manipulate it.

Outside the world of professional information technology, the term database is often used to refer to any collection of related data (such as a spreadsheet or a card index) as size and usage requirements typically necessitate use of a database management system.

Existing DBMSs provide various functions that allow management of a database and its data which can be classified into four main functional groups:

·         Data definition – Creation, modification and removal of definitions that define the organization of the data.

·         Update – Insertion, modification, and deletion of the actual data.

·         Retrieval – Providing information in a form directly usable or for further processing by other applications. The retrieved data may be made available in a form basically the same as it is stored in the database or in a new form obtained by altering or combining existing data from the database.

·         Administration – Registering and monitoring users, enforcing data security, monitoring performance, maintaining data integrity, dealing with concurrency control, and recovering information that has been corrupted by some event such as an unexpected system failure.^[4]

Both a database and its DBMS conform to the principles of a particular database model."Database system" refers collectively to the database model, database management system, and database.

Physically, database servers are dedicated computers that hold the actual databases and run only the DBMS and related software. Database servers are usually multiprocessorcomputers, with generous memory and RAID disk arrays used for stable storage. RAID is used for recovery of data if any of the disks fail. Hardware database accelerators, connected to one or more servers via a high-speed channel, are also used in large volume transaction processing environments. DBMSs are found at the heart of most database applications. DBMSs may be built around a custom multitasking kernel with built-in networking support, but modern DBMSs typically rely on a standard operating system to provide these functions.

Since DBMSs comprise a significant market, computer and storage vendors often take into account DBMS requirements in their own development plans.^[7]

Databases and DBMSs can be categorized according to the database model(s) that they support (such as relational or XML), the type(s) of computer they run on (from a server cluster to a mobile phone), the query language(s) used to access the database (such as SQL or XQuery), and their internal engineering, which affects performance, scalability, resilience, and security.

Database Server

A database server is a server which houses a database application that provides database services to other computer programs or to computers, as defined by the client–server model.^[^{citation needed}^]^[1]^[2] Database management systems (DBMSs) frequently provide database-server functionality, and some database management systems (such as MySQL) rely exclusively on the client–server model for database access (while others e.g. SQLite are meant for using as an embedded database).

Users access a database server either through a "front end" running on the user's computer – which displays requested data – or through the "back end", which runs on the server and handles tasks such as data analysis and storage.

In a master-slave model, database master servers are central and primary locations of data while database slave servers are synchronized backups of the master acting as proxies.

Most database applications respond to a query language. Each database understands its query language and converts each submitted query to server-readable form and executes it to retrieve results.

Examples of proprietary database applications include Oracle, DB2, Informix, and Microsoft SQL Server. Examples of free software database applications include PostgreSQL; and under the GNU General Public Licence include Ingres and MySQL. Every server uses its own query logic and structure. The SQL (Structured Query Language) query language is more or less the same on all relational database applications.

For clarification, a database server is simply a server that maintains services related to clients via database applications.

Database Management System

A database management system (DBMS) is system software for creating and managing databases. The DBMS provides users and programmers with a systematic way to create, retrieve, update and manage data.

A DBMS makes it possible for end users to create, read, update and delete data in a database. The DBMS essentially serves as an interface between the database and end users or application programs, ensuring that data is consistently organized and remains easily accessible.

The DBMS manages three important things: the data, the database engine that allows data to be accessed, locked and modified -- and the database schema, which defines the database’s logical structure. These three foundational elements help provide concurrency, security, data integrity and uniform administration procedures. Typical database administration tasks supported by the DBMS include change management, performance monitoring/tuning and backup and recovery. Many database management systems are also responsible for automated rollbacks, restarts and recovery as well as the logging and auditing of activity.

The DBMS is perhaps most useful for providing a centralized view of data that can be accessed by multiple users, from multiple locations, in a controlled manner. A DBMS can limit what data the end user sees, as well as how that end user can view the data, providing many views of a single database schema. End users and software programs are free from having to understand where the data is physically located or on what type of storage media it resides because the DBMS handles all requests.

The DBMS can offer both logical and physical data independence. That means it can protect users and applications from needing to know where data is stored or having to be concerned about changes to the physical structure of data (storage and hardware). As long as programs use the application programming interface (API) for the database that is provided by the DBMS, developers won't have to modify programs just because changes have been made to the database.

With relational DBMSs (RDBMSs), this API is SQL, a standard programming language for defining, protecting and accessing data in a RDBMS.

Popular types of DBMSes

Popular database models and their management systems include:

Relational database management system (RDMS) - adaptable to most use cases, but RDBMS Tier-1 products can be quite expensive.

NoSQL DBMS - well-suited for loosely defined data structures that may evolve over time.

In-memory database management system (IMDBMS) - provides faster response times and better performance.

Columnar database management system (CDBMS) - well-suited for data warehouses that have a large number of similar data items.

Cloud-based data management system - the cloud service provider is responsible for providing and maintaining the DBMS.

Advantages of a DBMS

Using a DBMS to store and manage data comes with advantages, but also overhead. One of the biggest advantages of using a DBMS is that it lets end users and application programmers access and use the same data while managing data integrity. Data is better protected and maintained when it can be shared using a DBMS instead of creating new iterations of the same data stored in new files for every new application. The DBMS provides a central store of data that can be accessed by multiple users in a controlled manner.

Central storage and management of data within the DBMS provides:

·         Data abstraction and independence

·         Data security

·         A locking mechanism for concurrent access

·         An efficient handler to balance the needs of multiple applications using the same data

·         The ability to swiftly recover from crashes and errors, including restartability and recoverability

·         Robust data integrity capabilities

·         Logging and auditing of activity

·         Simple access using a standard application programming interface (API)

·         Uniform administration procedures for data

Another advantage of a DBMS is that it can be used to impose a logical, structured organization on the data. A DBMS delivers economy of scale for processing large amounts of data because it is optimized for such operations.

A DBMS can also provide many views of a single database schema. A view defines what data the user sees and how that user sees the data. The DBMS provides a level of abstraction between the conceptual schema that defines the logical structure of the database and the physical schema that describes the files, indexes and other physical mechanisms used by the database. When a DBMS is used, systems can be modified much more easily when business requirements change. New categories of data can be added to the database without disrupting the existing system and applications can be insulated from how data is structured and stored.

Of course, a DBMS must perform additional work to provide these advantages, thereby bringing with it the overhead. A DBMS will use more memory and CPU than a simple file storage system. And, of course, different types of DBMSes will require different types and levels of system resources.

3. Compare Files and Databases, discussing pros and cons of them

File

A data file is a collection of related records stored on a storage medium such as a hard disk or optical disc. A Student file at a school might consist of thousands of individual student records. Each student record in the file contains the same fields. Each field, however, contains different data. The image shows a small sample Student file that contains four student records, each with eleven fields. A database includes a group of related data files.

Database

A database is a collection of data organized in a manner that allows access, retrieval, and use of that data. Data is a collection of unprocessed items, which can include text, numbers, images, audio, and video. Information is processed data; that is, it is organized, meaningful, and useful.

Computers process data in a database into information. A database at a school, for example, contains data about students, e.g., student data, class data, etc. A computer at the school processes new student data and then sends advising appointment and ID card information to the printers.

4. Discuss different arrangements of data, giving examples for each

Linear arrangement

A Linear arrangement can be defined as a straight line arrangement typically involving not more than two dimensions. The key factor to be noted here is that arrangements are done only on one axis. When A is said to be on the left or ahead of B, in a linear arrangement, it cannot be assumed that A is to the immediate left of B or immediately ahead of B unless it is mentioned so specifically.

The directions given are relative in nature as it depends on from whose perspective the test-taker is deciding the directions. For example, if four people P, Q, R, S are sitting at a table from left to right in the same order, then Q is sitting to the left of R but to the right of P. Change in orientation, left and right, depends on two possible scenarios i.e. whether the test-taker assumes people to be facing the direction he is facing or whether he assumes them to be facing the opposite direction. But as long as consistency is maintained in incorporating the directions, this fact should not change the solution as the two scenarios are mirror images of each other.

Circular arrangement

A Circular arrangement can be defined as an arrangement having a closed loop. Typical examples include situations wherein seating arrangements around a table have to be made. The table can be of any shape and need not necessarily be circular. This is illustrated by the following diagrams.

Though the above diagrams look very different in terms of their structure, there would be minimal deviations in the interpretation of some common clues for all these diagrams.

For example, A is sitting opposite to D. B is sitting to the immediate left of A. B is sitting between A and C.

Complex arrangement

Complex arrangements are arrangements which involve more than two dimensions. The approach for these problems should be very similar to that of the linear arrangement problems except for the fact that the logical framework for interpreting the problem assumes special significance in this case. A lot of information needs to be comprehended in a complex arrangement problem, and hence, care should be taken to ensure that an appropriate framework which will aid smooth fitting and assimilation of data will be used.

5. Explain different types of databases, providing examples for their use

The information(data) is stored at a centralized location and the users from different locations can access this data. This type of database contains application procedures that help the users to access the data even from a remote location.

Various kinds of authentication procedures are applied for the verification and validation of end users, likewise, a registration number is provided by the application procedures which keeps a track and record of data usage. The local area office handles this thing.

Just opposite of the centralized database concept, the distributed database has contributions from the common database as well as the information captured by local computers also. The data is not at one place and is distributed at various sites of an organization. These sites are connected to each other with the help of communication links which helps them to access the distributed data easily.

You can imagine a distributed database as a one in which various portions of a database are stored in multiple different locations(physical) along with the application procedures which are replicated and distributed among various points in a network.

There are two kinds of distributed database, viz. homogenous and heterogeneous. The databases which have same underlying hardware and run over same operating systems and application procedures are known as homogeneous DDB, for eg. All physical locations in a DDB. Whereas, the operating systems, underlying hardware as well as application procedures can be different at various sites of a DDB which is known as heterogeneous DDB.

·

Data is collected and stored on personal computers which is small and easily manageable. The data is generally used by the same department of an organization and is accessed by a small group of people.

·

The end user is usually not concerned about the transaction or operations done at various levels and is only aware of the product which may be a software or an application. Therefore, this is a shared database which is specifically designed for the end user, just like different levels’ managers. Summary of whole information is collected in this database.

·

These are the paid versions of the huge databases designed uniquely for the users who want to access the information for help. These databases are subject specific, and one cannot afford to maintain such a huge information. Access to such databases is provided through commercial links.

These are used for large sets of distributed data. There are some big data performance issues which are effectively handled by relational databases, such kind of issues are easily managed by NoSQL databases. There are very efficient in analyzing large size unstructured data that may be stored at multiple virtual servers of the cloud.

Information related to operations of an enterprise is stored inside this database. Functional lines like marketing, employee relations, customer service etc. require such kind of databases.

These databases are categorized by a set of tables where data gets fit into a pre-defined category. The table consists of rows and columns where the column has an entry for data for a specific category and rows contains instance for that data defined according to the category. The Structured Query Language (SQL) is the standard user and application program interface for a relational database.

There are various simple operations that can be applied over the table which makes these databases easier to extend, join two databases with a common relation and modify all existing applications.

Now a day, data has been specifically getting stored over clouds also known as a virtual environment, either in a hybrid cloud, public or private cloud. A cloud database is a database that has been optimized or built for such a virtualized environment. There are various benefits of a cloud database, some of which are the ability to pay for storage capacity and bandwidth on a per-user basis, and they provide scalability on demand, along with high availability.

An object-oriented database is a collection of object-oriented programming and relational database. There are various items which are created using object-oriented programming languages like C++, Java which can be stored in relational databases, but object-oriented databases are well-suited for those items.

An object-oriented database is organized around objects rather than actions, and data rather than logic. For example, a multimedia record in a relational database can be a definable data object, as opposed to an alphanumeric value.

The graph is a collection of nodes and edges where each node is used to represent an entity and each edge describes the relationship between entities. A graph-oriented database, or graph database, is a type of NoSQL database that uses graph theory to store, map and query relationships.

Graph databases are basically used for analyzing interconnections. For example, companies might use a graph database to mine data about customers from social media.

6. Compare and contrast data warehouse with Big data

BASIS FOR COMPARISON

DATA WAREHOUSE

BIG DATA

Meaning

Data Warehouse is mainly an architecture, not a technology. It extracting data from varieties SQL based data source (mainly relational database) and help for generating analytic reports. In terms of definition, data repository, which using for any analytic reports, has been generated from one process, which is nothing but the data warehouse.

Big Data is mainly a technology, which stands

on volume, velocity, and variety of the data.

Volumes define the amount of data coming

from different sources, velocity refers to the

speed of data processing, and varieties refer to

the number of types of data (mainly support a

ll type of data format).

Preferences

If an organization wants to know some informed decision (like what is going on in their corporation, next year planning based on current year performance data etc), they prefer to choose data warehousing, as for this kind of report they need reliable or believable data from the sources.

If organization need to compare with a lot of

big data, which contain valuable information

and help them to take a better decision

(like how to lead more revenue, more

profitability, more customers etc), they obviously preferred Big Data approach.

Accepted Data Source

Accepted one or more homogeneous (all sites use the same DBMS product) or heterogeneous (sites may run different DBMS product) data sources.

Accepted any kind of sources, including

business transactions, social media, and

information from sensor or machine specific

data. It can come from DBMS product or not.

Accepted type of formats

Handles mainly structural data (specifically relational data).

Accepted all types of formats. Structure data,

relational data, and unstructured data

including text documents, email, video, audio,

stock ticker data and financial transaction.

Subject-Oriented

Data warehouse is subject oriented because it actually provides information on the specific subject (like a product, customers, suppliers, sales, revenue etc) not on organization ongoing operation. It does not focus on ongoing operation, it mainly focuses on analysis or displaying data which help on decision making.

Big Data is also subject-oriented, the main

difference is a source of data, as big data can

accept and process data from all the sources

including social media, sensor or machine

specific data. It also main on provide exact

analysis on data specifically on subject

oriented.

Time-Variant

The data collected in a data warehouse is actually identified by a particular time period. As it mainly holds historical data for an analytical report.

Big Data have a lot of approach to identified

already loaded data, a time period is one of

the approaches on it. As Big data mainly

processing flat files, so archive with date and

time will be the best approach to identify

loaded data. But it have the option to work

with streaming data, so it not always holding

historical data.

Non-volatile

Previous data never erase when new data added to it. This is one of the major features of a data warehouse. As it totally different from an operational database, so any changes on an operational database will not directly impact to a data warehouse.

For Big data, again previous data never erase

when new data added to it. It stored as a file

which represents a table. But here sometime

in case of streaming directly use Hive or Spark

as operation environment.

Distributed File System

Processing of huge data in Data Warehousing is really time-consuming and sometimes it took an entire day for complete the process.

This is one of the big utility of Big Data.

HDFS (Hadoop Distributed File System) mainly

defined to load huge data in distributed

systems by using map reduce program.

7. Explain how the application components communicate with files and databases

            Application components are reusable libraries that you can add to the applications you develop. An application component can be a client-side library or a server runtime block. Typical libraries might handle basic functions such as login or payments. They can also contain various elements such as non-visual runtime objects, visual components, integration adapters, and user interface screen packages.

Consider the example of a banking application. The application might require an image-processing library for processing checks, a non-visual runtime object, and an integration adapter to connect to the banking system for verification. A developer might consider assembling these reusable building blocks into application components, and then add them to multiple MobileFirst projects to accelerate the development of applications for a range of different devices.

An application component can help simplify and speed up the delivery of high quality mobile applications across multiple devices. An application component can also help developers in their interactions with customers, can provide value-added services, and can help developers understand how consumers use their mobile applications.

·         Creating application components from MobileFirst projects

You can create an application component based on a MobileFirst project. You define metadata information such as the name of the component and its version number, and you select the project resources that you want to include in the application component.

·         Viewing the contents of an application component

You can open an application component to view its contents by using a file compression tool.

·         Adding hooks to an application component

You add hooks to an application component to facilitate automation when the component is added to a MobileFirst project. These additional hooks are optional.

·         Validating application components

After creating an application component and adding hooks, you must validate the component.wcp file to ensure that it conforms to the correct syntax.

·         Adding application components to MobileFirst projects

After you have created and validated application components, you can add them to your MobileFirst projects.

·         Removing application components from MobileFirst projects

You can remove application components from a MobileFirst project if they are no longer required.

·         Troubleshooting adding and removing application components

Whenever you add or remove an application component, the existing MobileFirst project files are backed up.

8. Differentiate the SQL statements, Prepared statements, and Callable statements

SQL(Structured Query Language)^[5][6][7][8] is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS). It is particularly useful in handling structured data where there are relations between different entities/variables of the data. SQL offers two main advantages over older read/write APIs like ISAM or VSAM. First, it introduced the concept of accessing many records with one single command; and second, it eliminates the need to specify how to reach a record, e.g. with or without an index.

Originally based upon relational algebra and tuple relational calculus, SQL consists of many types of statements,^[9] which may be informally classed as sublanguages, commonly: a data query language (DQL),^[a] a data definition language (DDL),^[b] a data control language (DCL), and a data manipulation language (DML).^[c][10] The scope of SQL includes data query, data manipulation (insert, update and delete), data definition (schema creation and modification), and data access control. Although SQL is often described as, and to a great extent is, a declarative language (4GL), it also includes procedural elements.

SQL was one of the first commercial languages for Edgar F. Codd's relational model. The model was described in his influential 1970 paper, "A Relational Model of Data for Large Shared Data Banks".^[11] Despite not entirely adhering to the relational model as described by Codd, it became the most widely used database language.^[12][13]

SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of the International Organization for Standardization (ISO) in 1987.^[14] Since then, the standard has been revised to include a larger set of features. Despite the existence of such standards, most SQL code is not completely portable among different database systems without adjustments.

^{prepared
statement}

prepared statement is a feature used to execute the same (or similar) SQL statements repeatedly with high efficiency.

Prepared statements basically work like this:

1.               Prepare: An SQL statement template is created and sent to the database. Certain values are left unspecified, called parameters (labeled "?"). Example: INSERT INTO MyGuests VALUES(?, ?, ?)

2.               The database parses, compiles, and performs query optimization on the SQL statement template, and stores the result without executing it

3.               Execute: At a later time, the application binds the values to the parameters, and the database executes the statement. The application may execute the statement as many times as it wants with different values

Compared to executing SQL statements directly, prepared statements have three main advantages:

·                  Prepared statements reduce parsing time as the preparation on the query is done only once (although the statement is executed multiple times)

·                  Bound parameters minimize bandwidth to the server as you need send only the parameters each time, and not the whole query

·                  Prepared statements are very useful against SQL injections, because parameter values, which are transmitted later using a different protocol, need not be correctly escaped. If the original statement template is not derived from external input, SQL injection cannot occur.

Callable statements

CallableStatement interface is used to call the stored procedures and functions.

We can have business logic on the database by the use of stored procedures and functions that will make the performance better because these are precompiled.

Suppose you need the get the age of the employee based on the date of birth, you may create a function that receives date as the input and returns age of the employee as the output.



9. Argue the need for ORM, explaining the development with and without ORM

ORM(Object relational mapping)

Object-relational mapping (ORM, O/RM, and O/R mapping tool) in computer science is a programming technique for converting data between incompatible type systems using object-oriented programming languages. This creates, in effect, a "virtual object database" that can be used from within the programming language. There are both free and commercial packages available that perform object-relational mapping, although some programmers opt to construct their own ORM tools.

In object-oriented programming, data-management tasks act on objects that are almost always non-scalar values. For example, an address book entry that represents a single person along with zero or more phone numbers and zero or more addresses. This could be modeled in an object-oriented implementation by a "Person object" with attributes/fields to hold each data item that the entry comprises: the person's name, a list of phone numbers, and a list of addresses. The list of phone numbers would itself contain "PhoneNumber objects" and so on. The address-book entry is treated as a single object by the programming language (it can be referenced by a single variable containing a pointer to the object, for instance). Various methods can be associated with the object, such as a method to return the preferred phone number, the home address, and so on.

However, many popular database products such as SQL database management systems (DBMS) can only store and manipulate scalar values such as integers and strings organized within tables. The programmer must either convert the object values into groups of simpler values for storage in the database (and convert them back upon retrieval), or only use simple scalar values within the program. Object-relational mapping implements the first approach.

The heart of the problem involves translating the logical representation of the objects into an atomized form that is capable of being stored in the database while preserving the properties of the objects and their relationships so that they can be reloaded as objects when needed. If this storage and retrieval functionality is implemented, the objects are said to be persistent.

10. Discuss the POJO, Java Beans, and JPA, indicating their similarities and differences

POJO(plain old java project)

n software engineering, a Plain Old Java Object (POJO) is an ordinary Java object, not bound by any special restriction and not requiring any class path. The term was coined by Martin Fowler, Rebecca Parsons and Josh MacKenzie in September 2000: ^[1]

"We wondered why people were so against using regular objects in their systems and concluded that it was because simple objects lacked a fancy name. So we gave them one, and it's caught on very nicely."^[1]

The term "POJO" initially denoted a Java object which does not follow any of the major Java object models, conventions, or frameworks; nowadays "POJO" may be used as an acronym for "Plain Old JavaScript Object" as well, in which case the term denotes a JavaScript object of similar pedigree.^[2]

The term continues the pattern of older terms for technologies that do not use fancy new features, such as POTS (Plain Old Telephone Service) in telephony and Pod (Plain Old Documentation) in Perl. The equivalent to POJO on the .NET framework is Plain Old CLR Object (POCO).^[3] For PHP, it is Plain Old PHP Object (POPO).^[4][5]

The POJO phenomenon has most likely gained widespread acceptance because of the need for a common and easily understood term that contrasts with complicated object frameworks

Java beans

n computing based on the Java Platform, JavaBeans are classes that encapsulate many objects into a single object (the bean). They are serializable, have a zero-argument constructor, and allow access to properties using getter and setter methods. The name "Bean" was given to encompass this standard, which aims to create reusable software components for Java.

It is a reusable software component written in Java that can be manipulated visually in an application builder tool.

Features

·         Introspection

Introspection is a process of analyzing a Bean to determine its capabilities. This is an essential feature of the Java Beans API because it allows another application such as a design tool, to obtain information about a component.

·         Properties

A property is a subset of a Bean's state. The values assigned to the properties determine the behaviour and appearance of that component. It is set through setter method and can be obtained by getter method.

·         Customization

A customizer can provide a step-by-step guide that the process must be followed to use the component in a specific context.

·         Events

·         Persistence

It is the ability to save the current state of a Bean, including the values of a Bean's properties and instance variables, to nonvolatile storage and to retrieve them at a later time.

·         Methods

Advantages

·         The properties, events, and methods of a bean can be exposed to another application.

·         A bean may register to receive events from other objects and can generate events that are sent to those other objects.

·         Auxiliary software can be provided to help configure a bean.

·         The configuration settings of a bean can be saved to persistent storage and restored.

Disadvantages

·         A class with a zero-argument constructor is subject to being instantiated in an invalid state.^[1] If such a class is instantiated manually by a developer (rather than automatically by some kind of framework), the developer might not realize that the class has been improperly instantiated. The compiler cannot detect such a problem, and even if it is documented, there is no guarantee that the developer will see the documentation.

·         JavaBeans are inherently mutable and so lack the advantages offered by immutable objects.^[1]

·         Having to create getters for every property and setters for many, most, or all of them can lead to an immense quantity of boilerplate code.

JPA(java persistence API)

The Java Persistence API (JPA) is a Java application programming interface specification that describes the management of relational data in applications using Java Platform, Standard Edition and Java Platform, Enterprise Edition.

Persistence in this context covers three areas:

·         the API itself, defined in the `javax.persistence` package

·         the Java Persistence Query Language (JPQL)

·         object/relational metadata

History

The final release date of the JPA 1.0 specification was 11 May 2006 as part of Java Community Process JSR 220. The JPA 2.0 specification was released 10 December 2009 (The Java EE 6 platform requires JPA 2.0^[1].) The JPA 2.1 specification was released 22 April 2013 (The Java EE 7 platform requires JPA 2.1^[2].)

Entities

A persistence entity is a lightweight Java class whose state is typically persisted to a table in a relational database. Instances of such an entity correspond to individual rows in the table. Entities typically have relationships with other entities, and these relationships are expressed through object/relational metadata. Object/relational metadata can be specified directly in the entity class file by using annotations, or in a separate XML descriptor file distributed with the application.

The Java Persistence Query Language

The Java Persistence Query Language (JPQL) makes queries against entities stored in a relational database. Queries resemble SQL queries in syntax, but operate against entity objects rather than directly with database tables.

Motivation

Prior to the introduction of EJB 3.0 specification, many enterprise Java developers used lightweight persistent objects, provided by either persistence frameworks (for example Hibernate) or data access objects instead of entity beans. This is because entity beans, in previous EJB specifications, called for too much complicated code and heavy resource footprint, and they could be used only in Java EE application servers because of interconnections and dependencies in the source code between beans and DAO objects or persistence framework. Thus, many of the features originally presented in third-party persistence frameworks were incorporated into the Java Persistence API, and, as of 2006, projects like Hibernate (version 3.2) and TopLink Essentials have become themselves implementations of the Java Persistence API specification.

11. Identify the ORM tools available for different development platforms (Java, PHP, and .Net)

12. Discuss the need for NoSQL indicating the benefits, also explain different types of NoSQL databases

A NoSQL (originally referring to "non SQL" or "non relational")database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Such databases have existed since the late 1960s, but did not obtain the "NoSQL" moniker until a surge of popularity in the early 21st century, triggered by the needs of Web 2.0 companies.^[3][4][5] NoSQL databases are increasingly used in big data and real-time web applications.^[6] NoSQL systems are also sometimes called "Not only SQL" to emphasize that they may support SQL-like query languages, or sit alongside SQL database in a polyglot persistence architecture.^[7][8]

Motivations for this approach include: simplicity of design, simpler "horizontal" scaling to clusters of machines (which is a problem for relational databases),and finer control over availability. The data structures used by NoSQL databases (e.g. key-value, wide column, graph, or document) are different from those used by default in relational databases, making some operations faster in NoSQL. The particular suitability of a given NoSQL database depends on the problem it must solve. Sometimes the data structures used by NoSQL databases are also viewed as "more flexible" than relational database tables.

Many NoSQL stores compromise consistency (in the sense of the CAP theorem) in favor of availability, partition tolerance, and speed. Barriers to the greater adoption of NoSQL stores include the use of low-level query languages (instead of SQL, for instance the lack of ability to perform ad-hoc joins across tables), lack of standardized interfaces, and huge previous investments in existing relational databases.^[10] Most NoSQL stores lack true ACID transactions, although a few databases have made them central to their designs.

Instead, most NoSQL databases offer a concept of "eventual consistency" in which database changes are propagated to all nodes "eventually" (typically within milliseconds) so queries for data might not return updated data immediately or might result in reading data that is not accurate, a problem known as stale reads.^[11] Additionally, some NoSQL systems may exhibit lost writes and other forms of data loss.^[12] Some NoSQL systems provide concepts such as write-ahead logging to avoid data loss.^[13] For distributed transaction processing across multiple databases, data consistency is an even bigger challenge that is difficult for both NoSQL and relational databases. Even current relational databases "do not allow referential integrity constraints to span databases.Few systems maintain both ACID transactions and X/Open XA standards for distributed transaction processing.

13. Discuss what Hadoop is, explaining the core concepts of it 14. Explain the concept of IR, identifying tools for IR

Hadoop

Apache Hadoop ( /həˈduːp/) is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Originally designed for computer clusters built from commodity hardware^[3]—still the common use—it has also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

The core of Apache Hadoop consists of a storage part, known as Hadoop Distributed File System (HDFS), and a processing part which is a MapReduce programming model. Hadoop splits files into large blocks and distributes them across nodes in a cluster. It then transfers packaged code into nodes to process the data in parallel. This approach takes advantage of data locality, where nodes manipulate the data they have access to. This allows the dataset to be processed faster and more efficiently than it would be in a more conventional supercomputer architecture that relies on a parallel file system where computation and data are distributed via high-speed networking.

The base Apache Hadoop framework is composed of the following modules:

·         Hadoop Common – contains libraries and utilities needed by other Hadoop modules;

·         Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster;

·         Hadoop YARN – introduced in 2012 is a platform responsible for managing computing resources in clusters and using them for scheduling users' applications;

·         Hadoop MapReduce – an implementation of the MapReduce programming model for large-scale data processing.

The term Hadoop is often used for both base modules and sub-modules and also the ecosystem, or collection of additional software packages that can be installed on top of or alongside Hadoop, such as Apache Pig, Apache Hive, Apache HBase, Apache Phoenix, Apache Spark, Apache ZooKeeper, Cloudera Impala, Apache Flume, Apache Sqoop, Apache Oozie, and Apache Storm.

Apache Hadoop's MapReduce and HDFS components were inspired by Google papers on MapReduce and Google File System.

The Hadoop framework itself is mostly written in the Java programming language, with some native code in C and command line utilities written as shell scripts. Though MapReduce Java code is common, any programming language can be used with Hadoop Streaming to implement the map and reduce parts of the user's program.^[14] Other projects in the Hadoop ecosystem expose richer user interfaces

Information retrivel

Information retrieval (IR) is the activity of obtaining information system resources relevant to an information need from a collection. Searches can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for metadata that describe data, and for databases of texts, images or sounds.

Automated information retrieval systems are used to reduce what has been called information overload. An IR system is a software that provide access to books, journals and other documents, stores them and manages the document. Web search engines are the most visible IR applications.

An information retrieval process begins when a user enters a query into the system. Queries are formal statements of information needs, for example search strings in web search engines. In information retrieval a query does not uniquely identify a single object in the collection. Instead, several objects may match the query, perhaps with different degrees of relevancy.

An object is an entity that is represented by information in a content collection or database. User queries are matched against the database information. However, as opposed to classical SQL queries of a database, in information retrieval the results returned may or may not match the query, so results are typically ranked. This ranking of results is a key difference of information retrieval searching compared to database searching.^[1]

Depending on the application the data objects may be, for example, text documents, images,^[2] audio,^[3] mind maps^[4] or videos. Often the documents themselves are not kept or stored directly in the IR system, but are instead represented in the system by document surrogates or metadata.

Most IR systems compute a numeric score on how well each object in the database matches the query, and rank the objects according to this value. The top ranking objects are then shown to the user. The process may then be iterated if the user wishes to refine the query.

Asanga's Assignment

Data is collected and stored on personal computers which is small and easily manageable. The data is generally used by the same department of an organization and is accessed by a small group of people.

·

These are the paid versions of the huge databases designed uniquely for the users who want to access the information for help. These databases are subject specific, and one cannot afford to maintain such a huge information. Access to such databases is provided through commercial links.

Information related to operations of an enterprise is stored inside this database. Functional lines like marketing, employee relations, customer service etc. require such kind of databases.

Features

Advantages

Disadvantages

History

Entities

The Java Persistence Query Language

The Java Persistence Query Language (JPQL) makes queries against entities stored in a relational database. Queries resemble SQL queries in syntax, but operate against entity objects rather than directly with database tables.

Motivation

Data is collected and stored on personal computers which is small and easily manageable. The data is generally used by the same department of an organization and is accessed by a small group of people. ·

These are the paid versions of the huge databases designed uniquely for the users who want to access the information for help. These databases are subject specific, and one cannot afford to maintain such a huge information. Access to such databases is provided through commercial links.

Information related to operations of an enterprise is stored inside this database. Functional lines like marketing, employee relations, customer service etc. require such kind of databases.

Features

Advantages

Disadvantages

History

Entities

The Java Persistence Query Language

The Java Persistence Query Language (JPQL) makes queries against entities stored in a relational database. Queries resemble SQL queries in syntax, but operate against entity objects rather than directly with database tables.

Motivation

Data is collected and stored on personal computers which is small and easily manageable. The data is generally used by the same department of an organization and is accessed by a small group of people.

·