Computer Learning Centre: May 2011

Functions of a Computer

Any computer system can perform five basic functions.

Input

A computer can accept input data for the purpose of processing. This is called inputting.

Storing

Inputted data can be saved so that it is available for initial or additional processing as and when required. this is called storing.

Processing

Performing basic arithmetic or logical operations on data in order to get the input data converted into required useful information is known as processing.

Output

It is the process of producing useful information or results for the person or device, such as a printed report or visual system. the output can also be the input for a control system.

Computer Overviiew

Introduction

The word 'compute' comes from the word 'compute' which means to calculate. So, a computer is normally considered to be calculating device that can speedily perform arithmetic and logical operations.

the original objective for inventing a computer was to create a fast calculating machine, through a major part of work done by computers nowafays is non-mathematical. Therefore, defining a computer assembly only as a calculatinng device is not justified.

The computer is an electronic device designed to accept and store input data, manipulate it and output results under the direction of detailed, step-by-step stored programs and instructions.

Data

It denotes raw facts and figures such as numbers, words, amount, quantity, that can be processed, manipulated or produced by the computer. for example: Rita, 18, XI B. This is raw data.

Information

It is a meaningful and arranged form of data. Raw data does not make any sense on its own. So, it has to be arranged in a meaningful manner such that it makes sense. For example, Information that makes sense can be Rita aged 18 is in class XI B.

Hardware andSoftware

A computer consists of two fundamental components. One is called hardware, the other sothe software. Hardware refers to the physical components or blocks, for example, CPU, memory, input and output devices. Software is the package of instructions designed to operate the conserened hardware, for example, MS-DOS, Microsoft Office, etc. In fact, we can divide software into two broad categories:

1) System Software: It runs the basic functioning of a computer system. It consists of operating systems, compilers, translators, etc.

2) Application Software: the basic aim of making and running a computer is to get work done from it. So, programs which are developed in order to serve a particular application are known as application software. For example, Microsoft Office, Tally, etc.

Lossless compression algorithms

run-length encoding (also known as RLE)
dictionary coders :

LZ77 & LZ78
LZW

Burrows-Wheeler transform (also known as BWT)
prediction by partial matching (also known as PPM)
context mixing (also known as CM)
entropy encoding :

Huffman coding (simple entropy coding; commonly used as the final stage of compression)
Adaptive Huffman coding
arithmetic coding (more advanced)

Shannon-Fano coding
range encoding (same as arithmetic coding, but looked at in a slightly different way)

Run-length encoding

Run-length encoding (RLE) is a very simple form of data compression in which runs of data (that is, sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run. This is most useful on data that contains many such runs: for example, simple graphic images such as icons and line drawings.

For example, consider a screen containing plain black text on a solid white background. There will be many long runs of white pixels in the blank space, and many short runs of black pixels within the text. Let us take a hypothetical single scan line, with B representing a black pixel and W representing white:

WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWB

If we apply a simple run-length code to the above hypothetical scan line, we get the following:

12WB12W3B24WB

Interpret this as twelve W's, one B, twelve W's, three B's, etc.

The run-length code represents the original 53 characters in only 13. Of course, the actual format used for the storage of images is generally binary rather than ASCII characters like this, but the principle remains the same. Even binary data files can be compressed with this method; file format specifications often dictate repeated bytes in files as padding space. However, newer compression methods such as deflation often use LZ77-based algorithms, a generalization of run-length encoding that can take advantage of runs of strings of characters (such as BWWBWWBWWBWW).

Run-length encoding performs lossless data compression and is well suited to palette-based iconic images. It does not work well at all on continuous-tone images such as photographs, although JPEG uses it quite effectively on the coefficients that remain after transforming and quantizing image blocks. RLE is used in fax machines (combined with other techniques into Modified Huffman coding). It is relatively efficient because most faxed documents are mostly white space, with occasional interruptions of black.

Data that have long sequential runs of bytes (such as lower-quality sound samples) can be RLE compressed after applying a predictive filter such as delta encoding.

Dictionary coder

A dictionary coder, also sometimes known as a substitution coder, is any of a number of lossless data compression algorithms which operate by searching for matches between the text to be compressed and a set of strings contained in a data structure (called the 'dictionary') maintained by the encoder. When the encoder finds such a match, it substitutes a reference to the string's position in the data structure.

Some dictionary coders use a 'static dictionary', one whose full set of strings is determined before coding begins and does not change during the coding process. This approach is most often used when the message or set of messages to be encoded is fixed and large; for instance, the many software packages that store the contents of the Bible in the limited storage space of a PDA generally build a static dictionary from a concordance of the text and then use that dictionary to compress the verses.

More common are methods where the dictionary starts in some predetermined state but the contents change during the encoding process, based on the data that has already been encoded. Both the LZ77 and LZ78 algorithms work on this principle. In LZ77, a data structure called the "sliding window" is used to hold the last N bytes of data processed; this window serves as the dictionary, effectively storing every substring that has appeared in the past N bytes as dictionary entries. Instead of a single index identifying a dictionary entry, two values are needed: the length, indicating the length of the matched text, and the offset (also called the distance), indicating that the match is found in the sliding window starting offset bytes before the current text.

PPM compression algorithm

PPM is an adaptive statistical data compression technique based on context modeling and prediction. The name stands for Prediction by Partial Matching. PPM models use a set of previous symbols in the uncompressed symbol stream to predict the next symbol in the stream.

Predictions are usually reduced to symbol rankings. The number of previous symbols, n, determines the order of the PPM model which is denoted as PPM(n). Unbounded variants where the context has no length limitations also exist and are denoted as PPM*. If no prediction can be made based on all n context symbols a prediction is attempted with just n-1 symbols. This process is repeated until a match is found or no more symbols remain in context. At that point a fixed prediction is made. This process is the inverse of that followed by DMC compression algorithms (Dynamic Markov Chain) which build up from a zero-order model.

Much of the work in optimizing a PPM model is handling inputs that have not already occurred in the input stream. The obvious way to handle them is to create a "never-seen" symbol which triggers the escape sequence. But what probability should be assigned to a symbol that has never been seen? This is called the zero-frequency problem. One variant assigns the "never-seen" symbol a fixed pseudo-hit count of one. A variant called PPM-D increments the pseudo-hit count of the "never-seen" symbol every time the "never-seen" symbol is used. (In other words, PPM-D estimates the probability of a new symbol as the ratio of the number of unique symbols to the total number of symbols observed).

PPM compression implementations vary greatly in other details. The actual symbol selection is usually recorded using arithmetic coding, though it is also possible to use Huffman encoding or even some type of dictionary coding technique. The underlying model used in most PPM algorithms can also be extended to predict multiple symbols. It is also possible to use non-Markov modeling to either replace or supplement Markov modeling. The symbol size is usually static, typically a single byte, which makes generic handling of any file format easy.

Published research on this family of algorithms can be found as far back as the mid-1980s. Software implementations were not popular until the early 1990s because PPM algorithms require a significant amount of RAM. Recent PPM implementations are among the best-performing lossless compression programs for natural language text.

Context mixing

Context mixing is a type of data compression algorithm in which the next-symbol predictions of two or more statistical models are combined to yield a prediction that is often more accurate than any of the individual predictions. For example, one simple method (not necessarily the best) is to average the probabilities assigned by each model. Combining models is an active area of research in machine learning.

The PAQ series of data compression programs use context mixing to assign probabilities to individual bits of the input.

Entropy encoding

An entropy encoding is a coding scheme that assigns codes to symbols so as to match code lengths with the probabilities of the symbols. Typically, entropy encoders are used to compress data by replacing symbols represented by equal-length codes with symbols represented by codes where the length of each codeword is proportional to the negative logarithm of the probability. Therefore, the most common symbols use the shortest codes.

According to Shannon's source coding theorem, the optimal code length for a symbol is -logbP, where b is the number of symbols used to make output codes and P is the probability of the input symbol.

Two of the most common entropy encoding techniques are Huffman coding and arithmetic coding. If the approximate entropy characteristics of a data stream are known in advance (especially for signal compression), a simpler static code such as unary coding, Elias gamma coding, Fibonacci coding, Golomb coding, or Rice coding may be useful.

Huffman coding

In computer science and information theory, Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable length code table for encoding a source symbol (such as a character in a file) where the variable-length code table has been derived in a particular way based on the estimated probability of occurrence for each possible value of the source symbol. It was developed by David A. Huffman, and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes.".

Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix-free code (that is, the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol) that expresses the most common characters using shorter strings of bits than are used for less common source symbols. Huffman was able to design the most efficient compression method of this type: no other mapping of individual source symbols to unique strings of bits will produce a smaller average output size when the actual symbol frequencies agree with those used to create the code. A method was later found to do this in linear time if input probabilities (also known as weights) are sorted.

For a set of symbols with a uniform probability distribution and a number of members which is a power of two, Huffman coding is equivalent to simple binary block encoding, e.g., ASCII coding. Huffman coding is such a widespread method for creating prefix-free codes that the term "Huffman code" is widely used as a synonym for "prefix-free code" even when such a code is not produced by Huffman's algorithm.

Although Huffman coding is optimal for a symbol-by-symbol coding with a known input probability distribution, its optimality can sometimes accidentally be over-stated. For example, arithmetic coding and LZW coding often have better compression capability. Both these methods can combine an arbitrary number of symbols for more efficient coding, and generally adapt to the actual input statistics, the latter of which is useful when input probabilities are not precisely known.

Adaptive Huffman coding

Adaptive Huffman coding is an adaptive coding technique based on Huffman coding, building the code as the symbols are being transmitted, having no initial knowledge of source distribution, that allows one-pass encoding and adaptation to changing conditions in data. The benefit of one-pass procedure is that the source can be encoded realtime, though it becomes more sensitive to transmission errors, since just a single loss ruins the whole code.

Arithmetic coding

Arithmetic coding is a method for lossless data compression. It is a form of entropy encoding, but where other entropy encoding techniques separate the input message into its component symbols and replace each symbol with a code word, arithmetic coding encodes the entire message into a single number, a fraction n where (0.0 = n < 1.0).

Burrows-Wheeler transform

The Burrows-Wheeler transform (BWT, also called block-sorting compression), is an algorithm used in data compression techniques such as bzip2. It was invented by Michael Burrows and David Wheeler.

When a character string is transformed by the BWT, none of its characters change value. The transformation rearranges the order of the characters. If the original string had several substrings that occurred often, then the transformed string will have several places where a single character is repeated multiple times in a row. This is useful for compression, since it tends to be easy to compress a string that has runs of repeated characters by techniques such as move-to-front transform and run-length encoding.

Data compression theory and algorithms

Data compression

In computer science and information theory, data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than an un-encoded representation would use through use of specific encoding schemes. For example, this article could be encoded with fewer bits if one were to accept the convention that the word "compression" be encoded as "comp". One popular instance of compression that many computer users are familiar with is the ZIP file format, which, as well as providing compression, acts as an archiver, storing many files in a single output file.

Compression is useful because it helps reduce the consumption of expensive resources, such as disk space or transmission bandwidth. On the downside, compressed data must be uncompressed to be viewed (or heard), and this extra processing may be detrimental to some applications. For instance, a compression scheme for video may require expensive hardware for the video to be decompressed fast enough to be viewed as it's being decompressed (you always have the option of decompressing the video in full before you watch it, but this is inconvenient and requires storage space to put the uncompressed video). The design of data compression schemes therefore involve trade-offs between various factors, including the degree of compression, the amount of distortion introduced (if using a lossy compression scheme), and the computational resources required to compress and uncompress the data.

Applications

One very simple means of compression is run-length encoding, wherein large runs of consecutive identical data values are replaced by a simple code with the data value and length of the run. This is an example of lossless data compression. It is often used to better use disk space on office computers, or better use the connection bandwidth in a computer network. For symbolic data such as spreadsheets, text, executable programs, etc., losslessness is essential because changing even a single bit cannot be tolerated (except in some limited cases).

For visual and audio data, some loss of quality can be tolerated without losing the essential nature of the data. By taking advantage of the limitations of the human sensory system, a great deal of space can be saved while producing an output which is nearly indistinguishable from the original. These lossy data compression methods typically offer a three-way tradeoff between compression speed, compressed data size and quality loss.

Lossy image compression is used in digital cameras, greatly increasing their storage capacities while hardly degrading picture quality at all. Similarly, DVDs use the lossy MPEG-2 codec for video compression.

In lossy audio compression, methods of psychoacoustics are used to remove non-audible (or less audible) components of the signal. Compression of human speech is often performed with even more specialized techniques, so that "speech compression" or "voice coding" is sometimes distinguished as a separate discipline than "audio compression". Different audio and speech compression standards are listed under audio codecs. Voice compression is used in Internet telephony for example, while audio compression is used for CD ripping and is decoded by MP3 players. READ about the Theory>>>>

Types of Data Compression

There are two types of Data Compression :

1. Lossy Compression

It is also known as perceptual coding.

When we apply data compression to a message, the message may not be recovered exactly as it was before the compression.

This type of data compression is used only when lossis acceptable.

This type of compression is ideal to achieve higher compression.

It is not preferable for critical data like textual data. It is most useful for Digitally Sampled Analog Data (DSAD).

DSAD generally consists sound, video, graphics or picture files.

For example, in a sound file there can be very high and low frequencies, which the human ear cannot hear, may be truncated from file.

A picture created in MS-Paint can be saved with different formats like .bmp, .gif, .jpg, it occupies different space.

Different format use different techniques to store the same image.

The size of each of the format shows that as size of the picture reduces, loss of data increases.

2. Lossless Compression

In this compression, the original data can be resonstructed after transmission or after decompression.

Here the originnal data can exactly be decoded.

It works by finding repeated patterns in a message and encoding those patterns in an efficient manner.

So Lossless data compression is also reffered to as 'redundancy Reduction'.

This type of compression may not work well on random messages, as it is dependent on the patterns in the message. Lossless data compression is ideal for textual information.

Data Compression

Definition

The word 'compression' means 'reduction'.

Data compression is a process of encoding information applying some specific encoding scheme.

It is used for storing the data as well as for transmission.

When we receive the data it should be converted into its original form. Sometimes we may not be able to receive the original form of information after decodinng of compressed data, especially in case of pictures.

the goal of data compression is to store information source as accurately as possible using the fewest number of bits.

Need of data compression.

It is useful for reducing the compsumption of expensive resourses like hard disk space (Secondary Storage Devices) or transmission bandwidth.

The design of data compression schemes involves trade-offs among various facttors like the degree of compression, the amount of distortion introduced and the computational resources required to compress and uncompress the data.

To reduce the storage space.

To increase the capacity of the communication channel.

Information security - Data compression often changes the format of the original message. So it can be used for sensitive data to make it secure to some extent.

Backup

Most of the organizations maintain huge data with duplicate set known as backup.

Backup needs a huge storage space because after a certain period of time it needs to have a copy of the original data. Data compression is also widely used in backup utilities.

Main Menu for Learning VB

Date and Time Functions in VB

Not only does Visual Basic let you store date and time information in the specific Date data type, it also provides a lot of date- and time-related functions. These functions are very important in all business applications and deserve an in-depth look. Date and Time are internally stored as numbers in Visual Basic. The decimal points represents the time between 0:00:00 and 23:59:59 hours inclusive.

The system's current date and time can be retrieved using the Now, Date and Time functions in Visual Basic. The Now function retrieves the date and time, while Date function retrieves only date and Time function retrieves only the time.

LEARN MORE>>>>>

Constants, Data Type Conversion, Visual Basic Built-in Functions

Constants are named storage locations in memory, the value of which does not change during program Execution. They remain the same throughout the program execution. When the user wants to use a value that never changes, a constant can be declared and created. The Const statement is used to create a constant. Constants can be declared in local, form, module or global scope and can be public or private as for variables. Constants can be declared as illustrated below.

Public Const gravityconstant As Single = 9.81

Predefined Visual Basic Constants

The predefined constants can be used anywhere in the code in place of the actual numeric values. This makes the code easier to read and write. LEARN MORE >>>>>

User-Defined Data Types in Visual Basic 6

Variables of different data types when combined as a single variable to hold several related informations is called a User-Defined data type.

A Type statement is used to define a user-defined type in the General declaration section of a form or module. User-defined data types can only be private in form while in standard modules can be public or private. An example for a user defined data type to hold the product details is as given below.

Private Type ProductDetails

ProdID as String

ProdName as String

Price as Currency

End Type

The user defined data type can be declared with a variable using the Dim statement as in any other variable declaration statement. An array of these user-defined data types can also be declared. An example to consolidate these two features is given below.

Dim ElectronicGoods as ProductDetails ' One Record

Dim ElectronicGoods(10) as ProductDetails ' An array of 11 records

LEARN MORE>>>>

Pages