Contents


    Systems Architecture

      Introduction

        In this part of the course we look at ways to describe a system in terms of the hardware, software, and people in the system. We will start by reviewing some of the options for hardware. After that we will look at techniques for documenting many different systems: the current, proposed, planned, and implemented systems.

        Physical and Logical Models

        Systems Architecture is strongly oriented towards physical models of the current and the future systems. We use the same diagrams to how it actually is and how it should be. System Architecture defines a Physical model that (hopefully) will implement or realize the more abstract logical models. It should also have the right qualities: speed, security, reliability, and so on. Logically we consider the options under four headings:

        [Input -> Processor -> Output + Connections]

        We will cover the options for Input, Connections, Output, Storage, and Processors before looking at tools for expressing architectures.

        First however some problems and patterns. First be aware that computer hardware is very versatile. It can simulate other pieces of hardware. This is often called Virtualization. For example, most computers these days have their own machine code, but some one has written a program called the JVM (Java Virtual Machine) that lets the machine execute Java byte code. Another example (that goes back to the 60's) is virtual memory which uses disks to simulate RAM. The result is a slightly slow computer with a lot more apparent memory. For 40 years, operating systems have allowed many programs to all be running at one time on one piece of hardware. If each program seems to have its own computer -- and can not access another programs data then we have a virtualization system running under a hypervisor. This makes it very easy to move programs from one machine to another and to use all the power of a expensive piece of hardware. There has been a strong virtualization trend in the last ten years.

        Problem -- Keeping up with new hardware

        New hardware appears every month. It pays a computer professional to keep an eye on new developments. I don't know of a really good source on the web.... but I do follow [ http://slashdot.com ] and [ http://www.wired.com ] as straws in the wind.

        I subscribe to a blog entitled "Coding Horror" that often has interesting and enlightening comments on software development. As an example, when you have time you check out this article [ 001003.html ] describing the ways in which system and software architecture goes astray and ultimately becomes unmanageable.

        A better technique is to join a professional group like the ACM -- Association for Computing Machinery and/or the IEEE-CS -- Institute of Electrical and Electronic Engineers Computer Society. Both offer student memberships. And discounts if you are a member of the other one. These publish excellent magazines and journals. These, in turn, are available on this campus as electronic digital libraries. On campus, you can check the societies out for nothing at [ http://www.acm.org/ ] and [ index.jsp ] and even drill down into their digital libraries.

        Old technology

        Be aware that old technology is kept in use for a long time. You are likely to meet examples of old devices [ a3.dotmatrix.html ] still in use in a current system. Always find out why -- sometimes the old device is still the best solution or the only solution to a systems problem. If, and only if, there is no reason -- the old device can become a focus for change.

        Here is a story, from SlashDot, about the use of older computers [ http://www.silicon.com/management/public-sector/2010/09/25/space-exploration-the-computers-that-power-mans-conquest-of-the-stars-39746245/ ] (regular) [ http://m.silicon.com/management/public-sector/2010/09/25/space-exploration-the-computers-that-power-mans-conquest-of-the-stars-39746245/ ] (mobile).

        Here is another example. I learned to program on my college's Elliott 803 minicomputer. When I became a lecturer, 7 years later the CS Department still had an Elliot 803, and the university was using another one as a peripheral controller. I ran the department's 803, at a profit for 5 years. It cost roughly $400 per year to maintain and a company paid $800 to use it. They had no choice because they used software that couldn't be ported to another machine. Meanwhile the department used it for experiments.

        Pattern -- Technology Adoption

        1. Wild Idea
        2. Laboratory demonstration
        3. Hype
        4. Early Adoption
        5. Mainstream -- Competing similar technologies
        6. Obsolete

        Pattern -- Four Classic Architectures

          Follow the links to Wikipedia articles if you need more information.
        1. Mainframe [ Mainframe_computer ]
        2. Peer-to-Peer [ Peer_to_peer ]
        3. Client-Server [ Client-Server ]
        4. Cloud Computing [ Cloud_computing ]

      . . . . . . . . . ( end of section Introduction) <<Contents | End>>

      Processors

        Here is a list of processor types:
        1. Supercomputer
        2. Mainframe...enterprise server
        3. Clusters, Grids, Clouds,...: many processors+memories in a tight fast network.
        4. Multicore PCS -- several CPUs on one chip. More power with same clock and heat.
        5. Many PC's in a rack -- shared display and keyboard.
        6. PC -- Personal Computers (Apple ][, IBM PC, ...)
        7. Laptops -- PCs that can be carried
        8. Tablets -- Laptops minus the keyboard
        9. Palmtops (Palm, iPod, ...) and Game consoless
        10. Embedded Chip
        11. Special purpose processors: GPU (Graphics), Peripherals, ...

      1. The key difference to the user is not the hardware so much as the operating system. Here again there is an incredible range of possibilities.

        Operating Systems

        1. Mainframes and Minis had their own special OSs. To find out about these -- ask any faculty member!
        2. UNIX: AT&T, BSD, Linux, MacOS X, iOS, Android... If you want the history the Wikipedia [ Unix ] article is quite good.
        3. The MS Family: DOS (Disk OS), Windows 3.0, Windows98, Windows2k, Windows XP, Windows Vista, Windows 7, etc. Again, for details (if you want) check out [ Microsoft_DOS ] [ Microsoft_Windows ] on the Wikipedia.

        . . . . . . . . . ( end of section Operating Systems) <<Contents | End>>

      . . . . . . . . . ( end of section Processors) <<Contents | End>>

      Input

        We can classify I/O (Input and Output) devices/options in several ways:
      1. Form factor
        • Embedded chip/board or Circuit
        • Cell Phone: may become the one peripheral that everybody in the world owns. Their functionality is increasing under strong competitive pressure. Go to any mall and play with their demo machines! We have had Blackberries and Palm Treos for some time. Now we have the iPhone, iPod Touch, and Google's Android (2008)... This appeared 2007 -- UMTS Universal Mobile Telecommunication System [ UMTS ] Hard to tell if which will be the winner and which the whiner!
        • Normal phone
        • Hand Held Device
          • Hand held bar code reader
          • Game controller
          • Palmtop/PDA/cell phone/MP3 player/Zune/...
        • Tablet
        • Laptop/Terminal
        • Workstation/PC
        • Special purpose work station -- eg. Point Of Sale -- -- --(POS)

      2. Input technology [ Input_device ] (Wikipedia).
        • Keyboard
        • Radio Frequency IDentification [ RFID ] (Wikipedia).
        • Micro-technology embedded in bodies: medical uses!
        • Headgear can read eye positions -- either with an infra-red beam or by reading signals to muscles.
        • Sound and (more complex) speech.
        • Body measurements -- Biometrics. These are technologies that extract information by measuring the body. They are mostly used to ID people. They include: finger print and palm print readers, iris scans, retinal scanners, ... The earliest (from Doug Engelbart) was to use people's weight to recognize and log them in!
        • Data capture devices: eg. bar code readers [ Barcode_reader ] [ Barcode ] (Wikipedia).
        • Digital camera
        • Smart phone with camera and QR app [ QR_code ] (Wikipedia).
        • Electronic Whiteboards
        • Graphics
          1. Stylus/pen-based
          2. Mouse

        • Touch screen: Originally you had a screen or tablet plus a special pen. Some use a magnetic pen (WACOM). Special coatings can also be used or a double layers pushed together. A common example is the PalmOS driven devices. I'm not sure how they digitize the pen movements -- but can work well (I have a had a slow spot in one part of the screen or a digitizer that reads taps as strokes and misreads the position on the screen by about 0.1 inches). Now some technologies let you use a finger -- very popular for kiosk machines like ATMs and voting, and now the impressive Apple Touch interface. Other manufacturers are introducing their own touch devices.
        • (Magnetic Ink Character Recognition): think checks! [ MICR ] (Wikipedia).
        • Scanners [ Image_scanner ]
        • OCR::= [ Optical_character_recognition ] -- On a special font it is excellent, and on a fixed known font it is quite good, but scanning regular text with its many fonts, typefaces, wrinkles in the paper, spots and so on, OCR only get 80% to 90% accuracy. The key technology is to convert the image into a two-dimensional array of points and try and match parts of it with known templates. There are improvements on this using special data structures and algorithms. I'm not sure where to get the nitty-gritty details. Some forms of OCR input can even handle hand-printed numbers.
        • Cards -- old [ Computer_punch_card ] (Wikipedia). , new [ Magnetic_stripe_card ] (Wikipedia). , and smart [ Smart_card ] (Wikipedia).
        • Phone Keyboard -- 4><12 array of buttons + some special.
        • Voice input and Speech recognition: in my experience flaky and typically needs training. Possible exception: Chinese. Chinese uses inflexions and tone to communicate meaning and voice recognition technology tends to react well to inflection and tone. (A result of a simple experiment at CSUSB CSCI dept in the 1980's). Speech recognition is good way to input data when the hands are busy and the vocabulary is small, fixed, and discrete. Recognizing normal speech is less effective -- and the technology is probably proprietary (= secret). Most techniques are based on separating out the different frequencies of sound that make up the sound: Fourier or Spectrum Analysis. This gives patterns that can be correctly recognized in many cases. However even recognizing where one word begins and another ends in normal speech turns out to be very difficult. It doesn't help that in normal speech we run words together and omit sounds that are supposed to be there. For details try the Wikipedia [ Speech_recognition ]

        • Game controllers: hand held, buttons, joysticks, ... forms of motion sensing, including the Wii Controller
        • Motion Sensing devices: iPod touch, [ Wii_Remote ] [ Wii Dupe.shtml ]
        • Haptic devices -- you hold and manipulate and get force feedback.
        • Manual push buttons, switches, knobs, rollers, etc.
        • CD-ROM -- Compact Disk Read Only Memory, and lately DVD...
        • Sound
        • Many other sensors -- eg. detecting particular molecules, ionization, humidity, pressure, temperature, ... -- all depending on Analog to Digital conversion.

      . . . . . . . . . ( end of section Input) <<Contents | End>>

      Principle -- Gather input data as close to its creation as possible

      As a rule -- get the input in to your system as close to where it is created as possible. Collect it automatically if possible. Avoid re-inputting data that can be stored securely. When secure, save information so that it does not have to be re-input. Note: Re-inputting data into a web form is a common design error on web systems.

      Principle -- you can not trust user input

      The first thing a system must do when data is input is to verify and validate it. Failures to do this has lead to embarrassment, security break ins, loss of money, etc. etc. If you do not stringently check the user's input is just like putting a "Welome Mat" outside your front door, leaving it unlocked, and going out for the night. Anything can happen....

      For example I have a simple PHP script that searches this web site. The user inputs a string they are interested in and the script "GETs" it and searches a dictionary of important terms... And all was well for 4 years and then I get an EMail from the campus Information Security Office. They were checking every script on all the web servers and found out that if you called the script directly, the script output an error message -- not too worrying except that the error message contains the path to the script on the server. This is like handing out a map of your house showing that you have your expensive sound system behind a flimsy wall...

      I spent 24 hours patching this and 20 other scripts to avoid this problem.

      Output

        Here [ Output_device ] is the Wikipedia summary.
      1. Types of Output: audio, fax, COM, COLD, EMail, Internet, Mobile, Special, printers, screens, sound, CD-RW, DVD-RW, ....
      2. Special: POS [ Point_of_sale ] , ATMs, special printers, plotters, photos, TVs, VCRs, Toasters, Blinking VCR displays, speakers, and earphones.
      3. Screens
      4. Printers: laser printer, page printer, line printers, ... [ Computer_printer ] [ Laser_printer ] [ Inkjet ] [ a3.dotmatrix.html ] [ Line_printer ]
      5. Special displays: lights, LEDs, LCDs, ...
      6. Mobile: cell phone, wireless PDA, ...
      7. EMail and Email attachments -- a simple way to get data from a computer to a remote or mobile user.
      8. Web page - open and insecure -- Again a simple way to share data that is not particularly secure.
      9. COLD [ Computer_Output_to_Laser_Disk ] (the predecessor to the CD-ROM and DVD).
      10. CD-ROM, DVDs, Blue Ray -- still a developing technology.
      11. COM -- Computer Output of Microfilm [ Microfilm ]
      12. Fax -- optional printer on many PCs/Macs.
      13. Audio -- These days speech is on a chip.

      Codes done later

      We will talk more about different ways of encoding data (EBCDIC, ASCII, Unicode, XML, .... ) later in the course.

      Storage

        The Principle of Locality

          Introduction

          The principle of locality is one of the most important principles for choosing and organizing data. It relates the design of data processing and software systems to their performance. Quite simply... Where data is stored determines how fast it can be found and retrieved. So, the closer the data is to where it is processed, the faster the system can run. Similarly, when the sequence of data accesses moves from a position to a nearby one then the system will run faster. For example, consider a normal telephone directory/list of contacts... It is easy to find the phone number of a person.... but try finding their neighbor's phone number in the same phone book. (No! You can't phone up the person and ask them for the name of their neighbor).

          Or consider, the old magnetic tape which can retrieve (and write) data very quickly once it starts moving at full speed, and as long as you don't stop or start. You can pick up the next piece of data almost instantly, but it takes several minutes to go back to the beginning of the tape, or to the end.

          Story -- intern slows down group compilations 100 times

          I learned this when I first used a new magnetic tape based compiler in ICI (Yorkshire, England) in the 1960's. Immediately after my first compilation, all the compilations by my team were taking 4 or 5 minutes! It turned out that I shouldn't have asked the compiler to compile my program to tape until I had removed all the compile errors. A bug in the compiler left an incomplete file on the tape if the compiler halted on an error while writing to tape. It did not write an end-of-tape marker. As a result my compilations (and my team's compilations) involved spooling 200ft of tape to get to the "end". My name was mud! But the compiler team thanked me for finding the bug. And then said "don't do that again!"

          Disks

          Moving to disks did not change the principle of locality. When the operating system scatters a file all over the disk the computer slows down. This is called fragmentation. We have special programs to defragment disks. But clever data design can make parts of the software run much faster. If the data is read in a sequence that makes the disk head jump at random then each read has an average time proportional to the size of the data set. But, sequential access is faster and depends (on average) on how fast the disk moves not on how much data is stored.

          Networks

          The principle of locality also applies to networks. As Admiral Grace Hopper observed light takes 1 nanosecond to travel 11.7 inches. She used to hand out pieces of wire cut to this length. I have one in my office.

          She used to observe that even her colleague admirals would understand that there are a lot of nanoseconds from a the ground to a satellite and so one could not instantly communicate with people the other side of the world. The long delay is an example of latency. It can be a major pain in web applications. By the time your server has communicated with the user's client they have lost attention. In some cases it is even worth having multiple copies of the data in many different servers so that it can be delivered rapidly to the processes that need it... but this needs subtle programming to make sure the different copies are synchronized.

          As an example if you want to download a copy of Real Player you will be invited to choose the closest of several servers. Similarly Netflix places it's servers in the same building with various ISP hubs to reduce latency and increase bandwidth.

          Primary Memory and Cache

          The principle of locality even holds at the machine code level: (fastest) data in cache vs data in RAM, data in RAM vs data in virtual (disk) memory, ...(slowest).

          Conclusion

          The principle of locality means that there is a sequence that lets you access the data faster than other sequences. As a result defragmentation is a key way to improve badly designed disk storage systems. Similarly, sorting data is a key technique for improving performance of computer system.

        Storage devices

          Size does matter -- faster and smaller .. slower and bigger

          The best device depends on many factors -- what you want it to do, cost, size, how much data, how fast, and how reliably, and how mobile, ...

          A key decision for a business is where to store its data. This is a common choice. Unfortunately the best answers have to be worked out on a case by case basis and even change as technology changes. For example Daniel Truckenmiller's senior project [ seminar/20120608DarinTruckenmiller.txt ] (June 2012) turned on replacing a networked storage device. It took research and some simple mathematics to sellect the best way forward.

          List of types of storage

        1. Registers and cache in CPU Very fast, small, and transient.
        2. RAM/Primary memory/Core [ Random_access ] Fast, getting bigger, but transient.
        3. Memory chips for cameras and hand held devices.
        4. Flash drives -- portable storage of data. [ Flash_drive ] Portable SSD. The computer spies best friend.
        5. Solid State Disks -- SSD [ SSD ] Slower than RAM but faster than disks. Persistent storage. Will fail after a large number of overwrites.
        6. Disks -- Direct access -- move head and wait for data to go by. [ Hard_disk ] [ Floppy_disk ] also Zip Disks, etc. Started out the size of a washing machine... Persistent storage. Survives until jostled or shut down incorrectly to often ( disk crash ).
        7. Optical storage devices: CDs,....
        8. Tapes -- Sequential Access -- but fast when you get up to speed. [ Magnetic_tape_data_storage ] Very Persistent storage.

        . . . . . . . . . ( end of section Storage devices) <<Contents | End>>

        Data Hierarchy

        1. Data Base -- A collection of linked files -- CMS.
        2. File -- a collection of records of one type -- all the student records we have.
        3. Record -- collection of elements referring to one entity -- Your student record.
        4. Element -- An indivisible atom of information -- example: student Id.

      . . . . . . . . . ( end of section Storage) <<Contents | End>>

      Connections

        Notice that data flows between processes can be internal to a computer or through a network. You can even connect outputs to inputs. However one common and simple improvement to a system is to spot a place where a human re-inputs data that is produced by a computer. This tends to be slow and error-prone and something to be avoided except for a good reason -- like security.

        Attributes of connections

        You can quantify the behavior of a connection in terms of three key values. Wise computer people tend to think in these terms: Latency, Bandwidth, and Reliability.
          Latency is the delay between when a signal/message is sent, and when it arrives. Latency is a time measured in microseconds, milliseconds, seconds, minutes, ...

          Bandwidth is a measure of how much data you can transmit in a given time. Typically you have to wait for the first message (Latency) and then the data starts flowing at the rate of the Bandwidth. This is measured in terms of the amount of information that can be sent per unit of time. For example bits per second, bytes per second, ... There is a special unit the Baud that is approximately bits per second. It is said that the Kludge Komputer Korporation had such a bad connections that the salesmen quote the bandwidth in cpf which stood for characters per fortnight:-) Related to Bandwidth is the user level concept of Throughput -- the number of transaction that can be done in a given time.

          You can find a lot of useful reference materials by starting at [ Bandwidth_(computing) ] on the Wikipedia.

          Reliability is a measure of how few errors are introduced when the data is sent through the connection. Also the chance of the connection being broken. Reliability is a complex property with no single measure. We can list the following problems with connections:

          1. Items are transmitted but are never received.
          2. Items are received but never transmitted.
          3. Items are transmitted and received more than once.
          4. Items are distorted as they move from transmitter to receiver.
          5. Items are received in a different order to which they are received.

          Working with a connection with a high probabillity of one or more of the above faults is difficult. You may have to waste bandwidth to detect and/or correct the errors.


      Example Communications Link -- Cell Phone

      Latency: the time to make the call. Bandwidth: How fast can you talk? Reliability: Distortion, frequencies clipped, Breaking up, bad coverage, dropped calls.

      Example Communications link -- campus Wi-Fi

      Latency: time to log in and go back to application. Bandwidth: Depends on load and which one you choose -- about 58Kbps -- any exact figires? Reliability: Seems pretty good... what do you think?

      Story -- Pigeons as a data connection

      For example: A company uses pigeons to take a memory stick from a camera to there home base. Why? OK... the latency is not good.... it take 30 minutes for the pigeon to fly down the Grand canyon. All the company wants is to have the photographs available to the people rafting down the canyon before they get back. But the bandwidth is equal to the size of the memory stick. Reliability? Depends on the presence of hawks!

      Exercise -- evaluate some other technologies.

      Manual Connections

      1. Paperwork -- provides a record of the communication. Can be scanned back in or retyped. Best done by other people!
      2. Sneakernet -- Copy data to a memory chip/flash drive/floppy/zip/tape/etc and walk it to the other part. Slow, cheap, but reliable!
      3. Face to Face.
      4. Phone...
      5. VOIP/SKYPE?

      Wired Connections

      1. A series of "Best technologies" for connecting devices no more than 6 feet apart by wire, like a disk drive and a PC:
        1. Ancient Proprietary systems.
        2. Serial -- the venerable RS232 interface and twisted pairs...
        3. Parallel -- the classic Centronics Printer ribbon cable.
        4. Small Computer System Interface [ SCSI ] (pronounced: skuzzy).
        5. Universal Serial Bus [ USB ]
        6. Firewire (The latest IEEE sponsored way of hooking up devices. Examples include cameras to PCs. IEEE 1394. See Wikipedia [ Firewire ] and this IEEE tutorial [ 2.730740 ] )
        7. ...

        Ethernet -- a protocol for transmitting data inside a single network. Originally designed for wide area radio networks (eg. Hawai) then adapted to coaxial cables and then to twisted pairs.

        You can build an Internet on top of any technology -- even phones and modems. The first internet connected several Ethernet networks. The Internet is defined by the TCP/IP stack of protocols. TCP defines how to move data and IP defines how to navigate multiple networks. Internet technology is largely defined by the Internet Engineering Task Force [ http://www.ietf.org/ ] (IETF) and the "Requests For Comment" [ http://tools.ietf.org/html/ ] that they archive. There is now a ton of technology that drives the Internet including repeaters, routers, switches, firewalls, Domain Name Servers (DNS), firewalls, and so on.

        WWW -- The World Wide Web is built on top of all the previous Internet technologies...

        VPN -- Virtual Private Network , using encryption to fake an isolated network. The introduction of the VPN technology looks a little fuzzy but the ideas were around in the 1990s and were standardized by 2000. Here is what I dug out of the Internet.

        • VPN's are mentioned and defined in a 1995 editorial by Darren Boulding: [ security.html ]
        • There were IEEE research papers in 1996 [ SDNE.1996.502456 ] and then an editorial [ 2.634834 ] in 1996.
        • The first standard (I can find) is an Internet Request For Comment [ rfc2764 ] posted in February 2000 by Gleeson, et. al.
        • In 2001 Don Hall claimed VPNs were covered by his 1992 US Patent #5,126,728.

      . . . . . . . . . ( end of section Wired Connections) <<Contents | End>>

      Wireless Connections

      1. The security maven's nightmare.... but so convenient.
      2. Blue Tooth for very local connections... Check out [ BlueTooth ] in the Wikipedia when you need details.
      3. IEEE 802-11? -- WiFi, WiMax, ... There are a wide array of IEEE standards for wireless connections. See [ 802_11 ] on the Wikipedia for details.
      4. Data can be sent through a cell phone connection: hand-set to cell tower to internet.

      . . . . . . . . . ( end of section Wireless Connections) <<Contents | End>>

      Network Topology

      The word "Topology" means "the science of position" and in the context of networks indicates the connectivity or structure of the network. So "network topology" is a question of how the parts are connected. We talk about node as the parts and arcs as the connections. More connections mean more money and complexity.... but more connections mean greater reliability:
      1. Bus -- High speed backbone with branches. Use to connect peripherals and central processor and memory together inside a single compute.
      2. Linear -- simplest, cheapest, and most likely to be separated. Each node connected to one or two neighbors.
      3. Star -- A mathematical tree guarantees that there is one path from any node to another. Or none, if the network breaks down.
      4. Ring -- Send the token round!
      5. Network -- many paths give reliability etc. But costs more. Signals can take the shortest route -- saving time.

      Notes on Setting up Reliable Networks

      I wrote the following notes in response to a request from a student. I hope they help. However I don't expect you to memorize these hints when I write quizzes and final questions.
      1. Get Trained! Take our System Admin sequence: CSCI360, CSCI365, and CSCI366 -- they are part of a BA option.
      2. Abandon any idea of being up-to-date and cutting edge. Remember the bath-tub curve: reliability is best in the middle of a technology's life. The chance of failure is high initially and increases at the end of the life time.

        [Bathtub curve -- high at start, low in middle, increase at end]

        NASA on-board computers are always several generations behind to maximize reliability. Best choose things that other people have already had good experiences with. Others can be bleeding edge:-)

      3. On the other hand keep your software up to date -- MS products get monthly security patches and anti-virus products seems to update several times a week.
      4. Next: how reliable? 365/24/7 is more expensive than 20/6.
      5. Reliable cabling: hidden and redundant. Even so, a back hoe, a rat, or a squirrel can bring things crashing down.
      6. Reliable hardware -- and that means a controlled and secure environment. I've known servers to shutdown for an hour once a week when the custodian unplugged it and plugged in a floor polisher! Lock up key servers.
      7. Don't forget you will need backup processors and a way to backup and recover data.
      8. Then you need to set up redundant servers for running a network: DNS, NIS or LDAP servers, NFS or other file sharing, Web servers, ...
      9. All software must be fully up to date with patches, else get the last stable release. Design a system that keeps all MS systems up to date.
      10. If you've gone for Wi-Fi then you must secure it. Recall: WiFi works through walls.
      11. Then the security system: fire walls (perimeter and between sub networks).
      12. Did I mention backing up all the data?
      13. Develop admin procedures that monitor and improve reliability.
      14. You need to maintain a configuration management inventory! It documents the version of each component do you have on each computer. You will need to know when each component was last updated.
      15. Then train the administrators
      16. Set up the panic button schedule: who comes in at 11 at night to reboot the system after a power cut?
      17. Then comes the client machines. Big question: what hardware and what platform? How do you make sure that large numbers of workstations are backed up?
      18. Then train the users in reliable computing.
      19. Did I mention backing up all the data?

    . . . . . . . . . ( end of section Connections) <<Contents | End>>

    System Architecture and UML Deployment Diagrams

      The UML provides a new standard way to describe the architecture of a system: the hardware and the software that it executes. Prior to that there was a branch of flow charting i (see [ Systems Flowchart ] below) that was used to indicate the physical devices in a system and how they were connected.

      Simple UML Deployment Diagrams: hardware and connections

      These two figures show the system I used to use up until Summer 2008. And the replacement I was hoping for. Later I will show you what actually happened. In both diagrams I'm using the old UML1 notation from the old Rational Rose free student edition.

      [Palm+2 PCs+an old iMac+ Flash drive + Mainframe and two servers]

      {Treo+Tablet+iMac with OSX+CSUSB servers]

      In a deployment diagram there are three-dimensional cubes or boxes called nodes. In the diagrams above the cubical boxes represent hardware devices and computers. Deployment diagrams also show the connections between nodes as simple lines with no arrowheads. Finally the software that is deployed onto the hardware is also shown inside the 3D boxes (nodes). This notation changed in 2003 from UML1 to UML2.

      UML2.0 notation for system architecture -- Deployment Diagrams


        Deployment Diagrams
      1. Show Nodes and Artifacts.
      2. The Nodes are 3-D boxes. They can be hardware or software as long as they execute programs. Examples -- A Laptop. An iPod. A Mainframe. The Java Virtual Machine. The UNIX Shell. Visual BASIC.
      3. Nodes have Artifacts.
      4. These Artefacts can be listed in the box or shown inside rectangles in cubes. They are files, programs, data, etc.
      5. The Communication paths are labeled with protocol names. Example: HTTP or TCP/IP or SSH. They are shown as simple lines with no arrowheads.
      6. Artifacts depend on each other. This shown as a dotted line with an arrow pointing from an artifact to the artifact it depends on. Example: A browser depends on a web server. A PHP script depends on the shell scripts and libraries it calls.
      7. Deployment diagrams can show the components that are manifested by artifacts. But this is rare. This links the system level parts to the software architecture.

      8. From the UML2 Language Reference Manual

        [2 Nodes+4 artifacts+ an interface and a communication path + 2 dependencies ]

      9. Example UML2 -- my 2010 Hardware

        iPod+Dells+HPs+iMacs+ etc etc

      10. Use simple connections between nodes. No Arrows. Just lines marked with the protocol.
      11. No connections between artifacts. But dependencies are OK and common.
      12. Nodes represent "Execution environments" including computers and operating systems.
      13. Nodes can be put inside nodes to show that one executes the other. For example to show that the client PC executes a browser and a Java Virtual Machine.
      14. Artifacts are things that are created: data, programs, scripts, libraries, ...
      15. Artifacts manifest elements of other models -- components, classes, ...

      16. Hardware -> Nodes
      17. Op Systems (if special) -> Nodes inside hardware nodes, else use tagged values.
      18. Virtual Machines -> Nodes. Example you could show the "Java Virtual Machine" on a PC as the execution environment for compiled Java Applets.
      19. Data bases executing SQL: SQL artifacts on Data Base node.
      20. Browsers that are asked to execute significant scripts and/or applets would also be nodes placed inside hardware. The scripts and applets are shown as artifacts. If the browser executes a virtual machine then this would also be an execution environment. A common example is the JVM -- Java Virtual Machine. Meanwhile on the server one might wish to show Microsoft's Common Language Infrastructure (CLI) as an execution environment for systems that use their .NET Framework.
      21. Simple data bases -> artifacts
      22. files->artifacts stereotyped <<file>>
      23. programs->artifacts stereotype <<process>>

      24. The UML defines how symbols in other kinds of diagram are linked to symbols in deployment diagrams. Classes are encapsulated in Components. Components are manifested as artifacts. Artifacts are deployed to nodes.
      25. In CSE557 we use these diagrams to analyze and design systems. In CS375 we will use them to design software.

      The short article [ Deployment_diagram ] on the Wikipedia gives a brief description.

      Tagged Values

      In the UML you can add constraints to things by using tagged values that look like this
       		{webserver="Apache Tomcat"}
       		{OS="MS XP"}
       		{CPU="Intel ...."}
       		{author=RJB, file="a3.html", source="a3.mth", language="MATHS"}

      These are a loose but useful way of supplying data about nodes and artifacts

      Stereotypes

      You can also attach some useful stereotypes to artifacts. The following are well known
      Table
      StereotypeMeaning(UML2)
      <<file>> A physical file in the context of the system developed.
      <<script>>A script file that can be interpreted by a execution environment or node.
      <<executable>>A program file that can be executed on a computer system.
      <<library>>A static or dynamic library file.
      <<source>>A source file that can be compiled into an executable file.
      <<document>>A generic file that is not a source file or executable.

      (Close Table)

      Example Architectures -- Programming

      Interpreters, Compilers, and Hybrid Languages

      Example Architecture -- iPad Safari Web Server Apache PHP Pages Data

      Deployment iPad Safari Web Server Apache PHP Pages Data

      The above summarizes these facts

      FYI UML1.* Notation

      You may see some of the older style deployment diagrams. So here are the rules:
      1. Show nodes and components.
      2. Nodes are 3-D boxes. Components can be listed under the box or shown as rectangles with "tongues" on the left inside the nodes.
      3. Computers -> Nodes
      4. Special nodes for devices other than computers.
      5. Connections between nodes are labeled with protocols. No Arrows.
      6. Op Systems shown as components or as a tagged value in a node.
      7. Nodes contain Components.
        • Virtual Machines
        • Data bases
        • files
        • programs
      8. Connections between components show dependency. You can also show the interfaces provided by the components by lollypops.

      Example UML deployment diagrams from a CSE557 Project

      [Browser communicates with Apache server which communicates with the Application server]

      Which to use -- UML1 or UML2

      The web is full of obsolete diagrams. Use UML2 in this class and CS375. The old notation suffers from clutter because it shows both the system architecture and the software architecture (components) on one diagram. UML2 is a less cluttered. It separates the systems architecture (deployment) from software architecture (components). UML1 deployment diagrams could also show devices that where not computers. This feature is missing from UML2 deployment diagrams.

      Advice: Only use UML1 if your organization has a policy or standard that you can not change.

      Exercise if you have time UML1 vs UML2

      Click through to these diagrams I found on the web. [ DemoSysDeploy.jpg ] [ MbariDeployment.gif ] [ deployment-diagram1.png ] [ deployment_diagram.gif ] [ 12779.jpg ] [ way_dep_diagram.jpg ]

      Which of the above are ULM1.* and which UML2.0? Which are correct? WHat conclusion can you draw?

      Classic Architectures

      1. Mainframe plus card input/output and line printers.
      2. Mainframe+terminals: A terminal is a special device with limited functions -- a keyboard for input and a screen for display.
      3. Mainframe+clients emulating terminals.
      4. Stand alone processing. Workstations without connections. Use sneakernet to share data.
      5. File sharing: Networked peer-to-peer workstations share data.
      6. Client/server: Dedicated server serves many client workstations.
      7. Fat and thin clients: A thin client has little special software and can not execute general purpose programs.
      8. Multi-tier client/server: Many servers with different functions.
      9. Middle-ware: Specialized "Glue" software for connecting tiers.
      10. AJAX -- (asynchronous JavaScript and XML) [ AJAX ]
      11. Peer-to-Peer: All machines have similar power and share the load.
      12. Virtualized Client/Server -- The servers are executed by one machine and each assumes it hase the whole machine.
      13. Cloud Computing: relying on the internet and special hidden servers which other people run, power, cool, secure, and maintain.

      History of CSUSB Student Information Systems Architecture

      Even though the functions of the Student Information System have not changed its name and architecture has changed many times.

      Architectures:

      1. Mainframe+cards and line printer
      2. Mainframe running SIS+ with line printer and cards
      3. Mainframe running SIS+ with access through a PC running a T3270 terminal emulator.
      4. Mainframe running SIS+ and TRACS etc.
      5. Mainframe running SIS+ and TRACS and webreg etc.

      [Student Information Systems through the ages]

      Currently we have a new architecture -- find out about it on the field trips. Peoplesoft = CMS

Questions

    Can you explain architecture in more detail

    An architecture describes the overall structure of something: the parts and how they are connected. A Systems Architecture describes the hardware and software making up the system. The performance, cost, and reliability of a system is often determined by the architecture. Thus we need to be able to record and evaluate architectures: what software and data is placed where on which hardware.

    Architecture is also a process of choosing the parts to meet the requirements of the stakeholders. We will look at this later [ c2.html ] (Choosing an Architecture) + [ r3.html ] (How requirements drive architecture). There will be more in CSCI375.

    Is it common for people still use UML 1.0

    Yes... but usually because they haven't learned the new standard.

    what are the major differences between ULM1 and UML2.0

    To get a summary of the differences look at [ ../papers/20050502Abstract.html ] and follow the links into the outline and then to the details.

    Which gives more information for systems architecture: UML 1.* or 2.0

    They are about the same. Except that the UML1.0 notation is harder to figure out.

    In UML 1.* or 2.0, can an artifact be connected to a node

    No! Artifacts are placed inside nodes: they are deployed on a node.

    in MbariDepolyment.gif, why are there small circles labeled JDBC, RS232, etc

    These represent interfaces -- lists of functions that are called/used on one side of the "lollypop" and implemented/provided by code on the other side.

    The particular diagram has a problem: it doesn't make clear which component provides the functions and which one uses them. It is better to have a dotted line from the client that uses the interface to the circle indicating it. Then a short solid line (think lollipop) from the interface to the component or class that provides it. The UML2.0 version has a cup for the client but this is hard to draw!

    We'll talk more about this notation in CSCI375.

    What are some other examples of "Executable environments" that nodes represent other than computers and operating systems

    However only show these as nodes if there is something special and unobvious that needs explaining. It is simpler to just mention them as [ Tagged Values ] in the node. Some people just list the internal environments and artifacts inside a node -- with no special notation.

    In both UML1 and UML2 it states there should be no arrows between nodes. However in some of the diagrams where you ask us to distinguish between the two there are arrows; please explain.

    Many people get this wrong. The result is nonstandard. In your work in this class we will do it right.

    Is there any other notation besides UML that I can use for system architecture

    Yes. The American National Standard Institute (ANSI) and European Computer Manufacturing Association (ECMA) provide very similar rules for systems architectures. They define special shaped symbols for different devices and connections between them. This is called a Systems Flowchart and here is an example from my Ph. D. Thesis (1971, Brunel University)

    [Two computers with peripherals and an interface]

    1. It shows the British mainframe (ICL 1900) with its magnetic storage, line printer, card reader, and hard disks. I seem to have forgotten the Card Punch -- or else it arrived after I drew the diagram.
    2. It shows the ICL 803b with its teletypes, paper tape station, plotter, and a prototype graphical display unit called the ETOM.
    3. The two computers were connected by a standard interface -- the British forerunner of the later SCSI.
    4. There are also two comments supplying information on the storage available on each machine: 1900: 32K * 24 bits, 803: 8k * 39 bits.
    5. There was no standard way to show two-way flows, digitizers, or plotters. I had to fake them.
    6. In those days computer people all owned a "Flow Charting Stencil" and a collection of drawing tools. My thesis was about the algorithms and languages needed so that people could draw diagrams using a computer instead. All hail to "MacPaint", "Macdraw", ... "Dia", and even "Visio"!

    Here are some of the symbols from that era drawn by the free "Dia" tool.

    [Flow chart symbols for CPU, RAM, Disks, etc]

    However Systems Flowcharts do not let you show what is deployed on the hardware or the details (when needed) of the nodes like the UML diagrams. For example, if I drew an UML2 Deployment diagram of the old system it would not show the peripheral devices but it would show the software. The 803 had Algol, SAP, and PictAlgol while the 1900 had FORTRAN, COBOL, and PLAN. The ICL had an Operating System (OS) called George 3 but the 803 had no OS:

    [Two nodes: ICL1903a and 803b, BSI connection, and software ]

    (Drawn using the Visio UML2 template by Hruby)

    Are UML deployment diagrams a form of data flow diagram

    No. They are about the physical connections between hardware and software. DFDs are about the abstract flows of information between abstract processes and data stores.

    What are nodes and artifacts

    A node is something that can execute code.

    An artifact is anything that is made and placed on a computer -- including data, documentation, files, programs, etc.

    There is one tough choice: Is an interpreter you must program, an artifact or a node. My answer is only show it as a node if you have an artifact that it will execute it. Then you need a box in which to place the artifact.

    Which one is better a thin client or fat client

    Depends on the project -- look at the requirements. Look at the properties of the hardware: the clients machine, the connections, the server? How fast are they? Can you easily download the extra software to "fatten the client"?

    How do you choose an architecture

    We will cover this in detail later. Right now understand that it is the desired qualities of a system (security, reliability, speed, size, ...) plus the reality that drives systems architecture.

    What is the architecture you feel is most secure, and why

    The most secure architecture is a mainframe in a locked room with no connection to the outside world:-)

    I think that operating systems that tackled security a long time ago tend to be more secure... especially when they are not the most popular ones. So I like the UNIX based ones. I've had good security experiences with the BSD versions of UNIX from way back.

    Linux or Mac X seem to be fairly secure. To make Windows 2K secure is tough if not impossible: I unplug mine from the network whenever I leave the office, and I run a personal firewall, and a virus checker, and the MS updates,... and turn all the software to the most secure properties, and use a suite of tools that encrypts its data...

    The most insecure component in any architecture is a foolish person -- fools are too ingenious.

Extra -- More on using Pigeons to transmit data

Reuters: September 10 [ http://au.news.yahoo.com/a/-/mp/6016150/pigeon-transfers-data-faster-than-south-africa-telkom/ ] reports that a South African company protested the speed of the local telecommonication companies by using a pigeon to send data. Quote:
    the 11-month-old pigeon, Winston, took one hour and eight minutes to fly the 80 km (50 miles) from Unlimited IT's offices near Pietermaritzburg to the coastal city of Durban with a data card was strapped to his leg.

    Including downloading, the transfer took two hours, six minutes and 57 seconds -- the time it took for only four percent of the data to be transferred using a Telkom line.


Why is the above comparison invalid?

Further reading [ 25.78.html ] (The Risks Digest).

Extra -- the future of the PC

Extra -- Build your own data center -- at home

[ building-servers-for-fun-and-prof-ok-maybe-just-for-fun.html ]

Review Questions

  1. Do you agree with the following article? [ the-pc-is-over.html ] (your answer should be several sentences long)!
  2. List a dozen Input Devices.
  3. List a Dozen Output devices.
  4. Compare Blue Tooth with IEEE802-11
  5. What options are available for processors?
  6. What is the principle of locality?
  7. Define latency, bandwidth, and reliability.
  8. Which UML diagram lets you describe the hardware and software that is used in a system?
  9. What is a Node in the UML?
  10. What is an artifact in the UML?
  11. Compare and contrast: thin and fat clients.
  12. Draw diagrams of the architectures of the systems at CSUSB.
  13. Draw a UML2.0 deployment diagram of a web based system where the server uses the JVM to generate pages, and the client is a PC running MS Windows Vista and MS IE6.3.2 with a JavaScript programs. The JavaScript and the JVM use XML to communicate.
  14. Find out about AJAX and draw a UML2.0 deployment diagram of it.
  15. Translate the following diagram into a series of simple English statements. For example: A is a node connected to .... C is a node that executes ...

    [Deployment diagram with parts labeled A through E]

  16. Checkout the trends [ http://news.cnet.com/8301-30685_3-57400136-264/survey-android-programmers-shifting-toward-web-apps/ ] on different mobile platforms. Then read the description of a a hybrid app and draw a deployment diagram that explains what a hybrid app is.

Abbreviations

  • TBA::="To Be Announced".
  • TBD::="To Be Done".

    Links

    Notes -- Analysis [ a1.html ] [ a2.html ] [ a3.html ] [ a4.html ] [ a5.html ] -- Choices [ c1.html ] [ c2.html ] [ c3.html ] -- Data [ d1.html ] [ d2.html ] [ d3.html ] [ d4.html ] -- Rules [ r1.html ] [ r2.html ] [ r3.html ]

    Projects [ project0.html ] [ project1.html ] [ project2.html ] [ project3.html ] [ project4.html ] [ project5.html ] [ projects.html ]

    Field Trips [ F1.html ] [ F2.html ] [ F3.html ]

    Metadata [ about.html ] [ index.html ] [ schedule.html ] [ syllabus.html ] [ readings.html ] [ review.html ] [ glossary.html ] [ contact.html ] [ grading/ ]

    End