[Skip Navigation] [CSUSB] / [CNS] / [Comp Sci & Eng Dept] / [R J Botting] /[CS372 Course Materials] /a4.html [a4.txt(Text)] [Search ] Thu Nov 12 13:09:50 PST 2009
[About] [News] [Schedule] [Syllabus] [Glossary] [Contact] [Grades]
Readings Analysis [1] [2] [3] [4] [5] Choices [1] [2] [3] Data [1] [2] [3] [4] Requirements [1] [2] [3]
[Review] Field Trips [1] [2] [3] Project Iterations [1] [2] [3] [4] [5]

Contents


    Modeling the Data in a System

      Story -- Data determines the feasibility of systems

      Recently, at this campus we rolled out a new system for handling registration etc -- including grades. It is called the "Common Management System" or CMS. A faculty member asked in the Fall of 2009 about posting "Incomplete" grades.
        Given that Faculty members now must use CMS to enter grades, why is the hardcopy multiple copy form required?

      Notice that this is a classic system improvement pattern remove paperwork.

      And here is the reply


        It just so happens that CMS has developed an incomplete form that will be delivered to us soon. We plan to roll it out in the Winter Quarter as we need time to do set up and training. The form does have a dialog box to enter what the student needs to complete and a deadline for that work. It also allows for a default grade other than the "F" or "NC" if sufficient work was completed at the time of the contract.

        Since some students need to sign or accept (they can do the acceptance through MyCoyote Student Grades) the contract before the grade rosters will be available, this feature will be added to the class roster, too.

        Our campus requested the Incomplete form as a result of the CSU Student Records Audit that noted we had not received forms for all of the incomplete grades, they were not completed properly when received, and did not have a student signature.....This is considered a contract between the faculty member and the student, so it does require both signatures.


      The answer was that the originally implemented system could neither input nor store the required data for the function of filing an Incomplete grade.

      Moral: find out about what data exists, what is needed, and how it can be computed or input.

      BY the way, also note the implementation strategy: roll out only some of the functionality at each iteration. Start with a good (but incomplete) system and add to it periodically. We will compare this to some alternatives (Big Bang for example) later.

      Story -- Available Data effects Reliability and Usage

        When we could only get rosters in hard-copy we would have to type the names and student IDs into a spreadsheet. This always introduced errors in the data.

        But when I could read the rosters through a terminal and copy/paste the data into a spreadsheet the errors almost disappeared.

        The latest system (CMS) gives a teacher the option to download his or her roster directly as a spreadsheet. This a very useful feature. It is the fastest and most reliable system I have used.

        Similarly, Course Management Systems on this campus (Blackboard, Moodle) also extract CMS data to populate the grading subsystem. Again, having easy access to the data makes the system an improvement over previous systems.

      Introduction to data models

      System flow charts are a popular and simple yet general model of a system. They show stuff moving through it, being 'stored' in it, being 'processed' by it, entering, and leaving it. This is a classic PowerPoint slide! We call the movements 'flows'. Here are some classic flows: goods, money, data, and objects. All shown as an arrow... which can be confusing.

      Computer based systems are almost entirely about handling data. To analyze and design systems that handle data we need a specialized diagram for showing the flow of data. These are called Data Flow Diagrams. We also need specialized diagrams for showing the structure of data. These are called Entity-Relationship-Diagrams

      If you want to change a systems it is vital to understand the data in it. The technical feasibility of a new system will often depend on what data is already available. Samples of data (printouts, forms, manual files and records) are a good starting point. So are the descriptions of data in the documentation and source code of any software in the system. But you need to make a more abstract or essential model of two things: the (1) the dynamic flow of the data through the system, and (2) the static structure of the data in the system. To master the complexity of a real domain you need diagrams that just show the essentials: how the data moves, where it is stored, and how different data is related. These are best done by drawing DFDs (Data Flow Diagrams) and simple ERDs (Entity Relationship Diagrams). The details are often described in a Data Dictionary and we will cover these later.

      Information Technology is all about delivering information to people. Information is data provided to the people who need it, in their preferred format, at the right time. Information needs to be computed reliably, cheaply, and securely. Tracing the flow of data from source to sink is a vital technique to achieve this aim.

      Story -- Go with the flow

      When I worked with in the British Civil Service a colleague described the following meeting. He had been invited to visit a branch and was there for the day to consult with them about a new computer system they wanted to develop. They explained that they wanted a program to print out a 20 page report. Each page had 20 columns and 50 rows.... He and they worked on the content and format of the report and half-an-hour before lunch they had the whole thing defined and ready to be programmed. The programming would be done by another team. Their was, apparently, nothing to do for the next 30 minutes.

      So the analyst asked -- "What do you do with the 20 page report?" And a manager replied -- "I look for the row with the largest value in column 17." So my friend asked: "would you like the computer to do that for you?" They replied: "Can a computer do that?" He said "Yes -- and printing one line instead of 1,000 will save money!". They agreed.

      So my friend asked: what do you do with the row of data? They told him "We multiply the 2nd column by the 4th column and subtract the 5th column". And he said: "The computer can do that too, if you want." They liked it.

      So my friend -- now hot on the track of the end of the data flow -- asked "what do you do with the result you calculated" -- they said "if it is greater than 100 we send a memo to the manager listed in column 1." At last, my friend had found an action... "Could we just send the memo for you and let you know it was sent?"

      They then went to lunch at a local pub...

      Moral -- always ask where the output data goes to. And contrariwise -- ask where input data comes from.

      ERDs and DFDs

      We analyze and design data flows using: external entities(input/source, output/sink), processes, and stores. The data flow diagram or DFD is the central diagram used in information technology. We also analyze the data itself to find out what its underlying structure is in an ERD.

      Once you have a DFD is it useful for pin pointing the changes the enterprise needs to make. You can use DFDs to present the choices to management. They form an excellent start for specifying the hardware and software that will be needed. Meanwhile the static model -- the ERD -- is the starting point for designing a data base and then designing objects inside software.

      In summary DFDs and ERDs are a useful intermediate step between problems/opportunities and solutions/plans.

      DFDs -- Data Flow Diagrams

        A DFD is a circuit diagram of an enterprise or a subsystem. When done right -- following some very specific rules -- they becomes a rigorous picture of a information processing system. Sometimes we inherit DFDs as documentation of an existing legacy software system. This can be very helpful.

        They are good for

        1. Making rough notes when interviewing people.
        2. Mapping out existing systems to find out things to change and things to leave alone.
        3. Planning new systems.
        4. Verifying our designs: how will they work?
        5. Presenting our plans to management and stakeholders.
        6. Specifying a process or function as a black box -- with hidden details inside.
        7. Documenting a system to help others understand it.
        8. Getting a list of data stores to start an Entity-Relationship or Conceptual Business Model.

        Definitions of DFDs

      1. DFDs::="Data Flow Diagrams".
      2. DFD::="A diagram that shows how data moves through processes and stores, from sources to sinks, in a system". A data flow diagram has
        1. External Entities -- Where data comes from or goes to
          • Sources -- Where data comes from
          • Sinks -- Where data goes to
          • Some External Entities are both sources for some data and sinks for other data.
        2. Processes -- Where things happen to data
        3. Stores -- Where data is held ready for future use
        4. Data Flows -- connecting processes to and from entities, processes, and stores.

        Here is an example of a rough pencil and paper DFD:

        [Author (entity) writes Document(store) and prints Document to Printer]

        Each DFD summarizes a collection of simple statements. The above diagram implies some of the following facts:

        1. The Author makes changes to the document.
        2. The Author reads a preview of the document.
        3. The document is printed on the printer.

        Physical and Logical Data Flow Diagrams

        A DFD can be used to model the physical or logical structure of a system. The physical model describes and names the hardware and software involved. Typically, each process is one program. Each store is a separate file (think folder in a filing cabinet) or a table in an existing data base. In other words, physical DFDs show the architecture of the system not its underlying logic.

        In a logical DFD there is no mention of how something is done. No technology is mentioned. Several programs may be inside a single process. One program may implement several processes. Stores are not described in terms of their media (data base, mag tape, disk, RAM,...) but are named for the entities (outside the system) that they store information about (student, teacher, ...).

        As a rule you should aim to move to logical DFDs as soon as possible. You can then solve the logical problems in the system without getting confused in the technology. This process produces a top-level design for a new system and is the start for specifying data and programs.

        Notations for DFDs

        There are three different icons in a DFD: External entity, Process, and store.

        There are several different notations: Yourdon and/or De Marco, Gane & Sarson, SSADM(Structured System Analysis and Design Methodology), and Unified Modeling Language have ways of showing DFDs.

        [Four Notations for DFDs]

        The SSADM notation was developed by the British Civil Service (with LBMS Ltd.) from the Gane and Sarson notation. It is used in England and what used to be the British Commonwealth. As far as I can judge the Gane and Sarson form is most often used notation in the USA. I will use Gane and Sarson and encourage you to do so as well. But different enterprises will use different notations. The Gane and Sarson notation also allows a process box to have three compartments. These are used for: (top) a unique process ID. (middle) description of the function of the process. (bottom) the location where the process is carried out of the actor responsible for the process.

        Below I have some notes [ UML notations for DFDs ] that show how the UML is used and explains why you should, for now, use one of the other notations rather than the UML.

        Semantics of DFDs

          Many people misunderstand DFDs -- they don't know what they mean. This section is about the meaning of the parts of a DFD. It is vital that you study the meaning of diagrams as well as just the syntax -- notation.

          Semantics of External Entities in DFDs

          External entities are outside the current system. There are sources and sinks. Sources show how data that flows into the system from outside. Sinks show where data leaves the systems. Some entities are both sources and sinks. We tend to think of entities as being people. But they can be parts of other systems -- hardware and/or software. The key point is that we can not redesign external entities. Our system has to fit them. They are also the main source of disturbances that the system must handle. We can not control the input from an external source unless we have a process to handle anything that can happen and sieve out the data that is needed for our system.

          Semantics of Processes in DFDs

          Processes are the only active part of a DFD. It is the only place where results can be computed, data processed and decisions made. Data does not flow without there being a process to move it. A process is best thought of as a continuously running program. They handle whole streams of data. They may wait when the data is not available but they do not stop. They may repeat the same computation on each item of data as it arrives. They can make decisions and route input data to different outputs. Processes can also wait to be asked for data and then provide it one their outputs. Try to not see them as steps in an algorithm -- use an activity diagram (later) for algorithms.

          Some processes are subsystems. This helps keep the diagrams of complex diagrams simple. They are shown as a whole process in some DFDs. Each is also defined by a DFD. This is called the refinement of the process. Such processes can contain hidden data stores and sub-processes. There is a potential tree of refinements.

          Semantics of Data Stores in DFDs

          Stores are places where data is placed, and where it waits to be used. Some people use the CRUD mnemonic to describe the interaction between a process and a data store:
        1. CRUD::acronym=Create + Read + Update + Delete.

          Ultimately the data flows between processes and data stores are (nowadays) programmed using the Structured Query Language --(SQL).

           	SELECT StudentName FROM Student WHERE Student.id = "123-45-6789"
          However it is a mistake to go in to this level of detail in a DFD.

          On the other hand you should aim to have each data store labeled with the name of a single type of real world object. The data store holds records about all entities of some type or other. The name of the data store should reflect the type of entity. Ultimately they become tables in a database or file.

          Traditionally, Creating data in a data store -- adding new items -- is shown by an arrow that flows from a process to a data store. Reading data is indicate by an arrow from the store to the process that needs it. Updates and deletions are shown as two way arrows since data has to be read and then rewritten.

          Notice that a data store is needed whenever data is reordered or reorganized. On the other hand if the store is a queue or buffer, so that the first item of data to arrive is the first to be output then we don't show a data store: arrows are understood to be buffered by a queue.

          Another note: you can simplify diagrams by putting the same data store in several places. Traditionally you make stores like this with a stripe at the left hand end.

          Semantics of Arrows in DFDs

          The meaning of a data flow (arrow in a DFD) is subtler than you might guess. It depends on the symbols at each end: process, Entity, or Store.

          Notice that only a process can move data. So each data flow must either come from or go to a process. We do not permit data flows to connect entities or stores unless a process is involved.

          Connections between processes and entities define the interfaces between the system and its environment. It is rarely unambiguous what data is communicated. Thus these data flows must be described -- at least given a name.

          Similarly, it is not clear when you connect one process to another process with an unlabeled arrow what is going on. The arrow needs to be named with the data being transmitted. The name will need further definition (later) in a Data Dictionary. Occasionally you will meet a doubled headed arrow -- here someone has to define the protocol that describes the conversation between the two connected processes.

          Notice that in real systems (unlike computer programs) data flows between processes are buffered. One process writes the data and the data waits in a queue until the other process reads it. The writer doesn't have to wait for the data to be taken away. For example when you send me Email it is automatically stored before I read it. Similarly "Snail Mail" is put in my box. Memos, rosters, etc. are all buffered for me. So when Modeling a real system you don't have to say that data in a data flow is in a queue. This buffering is implicit in the the Data Flow model.

          A data flow out of a store can only go to a process. It indicates that the process reads the data in the store but does not change it. External entities and stores are not allowed to read data directly -- they must get the data indirectly via a process. However, you don't have to label and document these data flows if the process can read the whole store. You only have to document the data flow from a data store if the process accesses only a part of the store.

          A data flow into a store must again come from a process. It indicates any combination of the three basic operations: Create, Update, or Delete. Again if the arrow is unlabeled then it is assumed that the process can (or will) change any item in the store.

          A double-headed arrow between a data store and a process indicates that the process may: create, read, delete and update the data in the store. Some omit the arrow heads in this case.

        . . . . . . . . . ( end of section Semantics of DFDs) <<Contents | End>>

        Drawing DFDs

        Keep DFDs simple by keeping them abstract, logical, or essential -- don't document the media and format of the information -- just give it a meaningful name. Note: you can keep a list of the current or planned media/formats in a "data dictionary". Similarly a DFD should not show the current type of a part: people, procedures, hardware, and software all tend to be implementations of processes. The type of a component should be noted in a data dictionary (see [ a5.html ] ).

        Do DFDs quickly -- pencil and paper, chalk-board. Only tidy them up when some else needs to see them. Use a tool only to impress people. However, even when sketching roughly follow the rules and avoid the errors listed on this page.

        Some people put unique short identifiers on each part of a DFD. Avoid this if you can! But in those cases where the boxes are numbered, here are the rules: processes are numbered 1,1.1, 1.2, ... and data stores have an id that starts with "D" plus a number. External entities can be given single lower case letters to be their unique id. These ids are good for linking the same part in different diagrams. For example, the parts numbered 1.1, 1.2, 1.3, etc. are all parts of the process numbered 1. Similarly, 1.2.1, 1.2.3, etc. are subparts of process 1.2.

        Never use more than one piece of paper for a DFD. The trick is to have layers of detail. We do this by expanding, exploding, or refining a process into a lower level diagram. This is done by taking a process and drawing a DFD that would replace it in the original DFD. There are three levels of detail commonly needed: context, level-0, and level-1. Here is a picture of how refinement works:

        Three levels of DFD

        The table shows the three types of DFD and is followed by definitions and examples.
        Table
        LevelContent
        Process ContextShows one process with its inputs and outputs only.
        System ContextOne process + surrounding external entities
        Level-0Make the central process BIG and draw stores, processes, and flows inside
        Level-1Take a process on the level-0 and repeat the expansion in another DFD
        Level-n+1Take a level n process and refine it.

        (Close Table)
        Note: 3 or 4 levels is usualy enough. Don't get too detailed. Other techniques [ r1.html ] are better.

        Examples of Levels

        [Context]

        [Level 0]

        [Level 1]

        A Note on level terminology

        I will be following well known textbooks on the naming of the levels. The Wikipedia seems to use a different form.

        Definitions of DFDs

      3. Context_DFD::DFD=Shows a system as a single process surrounded by external entities. This should show a single process -- your system surrounded by the external entities that send it data and get data from it. Each data flow should be named. No internal details allowed -- they come later. No data stores, no sub-processes: just establishes the Boundary between the system and its environment.

        One Process takes questions and answers from faculty and uses them to tutor students

      4. Level_0_DFD::=DFD=Shows the main functions in a system as processes.... At this level you show up to about a dozen main functions that the system provides, plus the data stores and external entities that interact with the processes. A Level_0_DFD always expands a Context_DFD

        Expansion of Tutor DFD into 2 data stores and 5 processes

      5. Level_1_DFD::DFD=Takes a single process in a Level_0_DFD and shows the details inside it.

      6. Fish_eye_DFD::DFD=Shows a DFD inside a box representing a process in another DFD. We have a central focus where we show the details but round the edge we have higher level symbols. An excellent way to refine a Context DFD to Level 0, a Level 0 process to Level 1, and so on. It is called a fish-eye diagram Because when a fish looks up out of the water it sees the whole 180 degree view compressed into a small circle. In the center of the view things look big. Further out the look small.

        Refining a DFD

        The process of finding out what is inside a process has many names: leveling, refinement, filling in the details, partitioning, exploding, decomposing, ... It is an important strategy for analyzing a problem. Start with the big picture -- the context -- then break it into smaller and smaller parts. Ultimately, as you decompose or refine processes, you will find yourself needing to express logical rules, algorithms, and types of data. Do not use a DFD to express complex logic, algorithms, or data structures. Instead, record these details by using techniques introduced later in this course:
        Table
        ProcessesActivity diagrams, Use Cases, and Scenarios. Prototypes.
        External EntitiesPersona
        Data flowsData dictionary entries and coding techniques.
        StoresEntity Relationship Diagrams, Tables, and Normalization

        (Close Table)

        The above is a top-down procedure. You can also draw rough DFDs of parts of the organization and link them together to get an "end-to-end" model. Here is an example from the first time this course was taught.

        [Free Information System for students 2003]

        These tend to be a little chaotic and unstructured.

        Principle -- DFDs are systems not programs

        My Law: DFDs are good for recording how a system works. They are a way of choosing what parts of a system to change and which to protect. They can be used to define the inputs and outputs to a program. You can use them to plan a collection of new and old software (system design). BUT don't use them to design the internals of a program. You will make errors. There are more modern techniques for designing programs.

        Rules of DFDs -- DFD Errors

        Notice and learn the rules below. The key thought is that data never moves unless a process moves it.

      7. DFD_Errors::=following,
        1. Process names must start with a verb and describe an action. Try the "Hey Mom Test." A process name should make sense when prefixed by "Hey, Mom, I'm going to .....". Some describe producing an output for each input (Calculate tax) but most do more -- Prepare monthly summary from weekly data.

        2. Stores and external entities should be named with specific noun phrases. They must not indicate any activity. They are passive. You may not use the words "Database" -- it is too general and conveys no information about the data in the store.

        3. Data flows do not transfer control. An arrow is not a function call or a go to! The process run in parallel. It is OK for a flow to send a message or signal.

        4. Name all data flows between processes. Unlabeled arrows between processes are often control flows and so wrong. However: arrows leaving and entering a well-named store only have to be named when they provide access to only parts of the data in the store.

        5. No Flowcharts. Do not use normal flow chart symbols like decision diamonds, START, STOP etc. in a DFD. All parts of a DFD exist at the same time and operate in parallel. A process can read and store data long before and after producing an output. Processes consumes streams of data and produce streams of data.

        6. No magic data flows. Data does not move without a process to move it. So each arrow must have at least one process. Never show arrows connecting an external entry to another entity, or to a data store. Never have an arrow that connects a data store to another store. Examples: Waiter and cook. Coordinator and secretary. Teacher and student. Student to student records. Customer to bank account.

          [no magic flows]

        7. No spontaneous generation. All processes have input.

        8. No black holes. All processes have outputs.

        9. No miracles. The input data must make it possible to compute any of the outputs.

        10. Maintain balance. Each upper level matches its lower level expansions.

        11. No forks or joins. When flows meet or split you must have a process to control the joining and/or splitting.

        Advice

        1. Number your nodes only if you have to. Example: the boss says so.

        2. But don't clutter the DFD with the format or media: phone calls, forms, EMail, disks, tapes, print outs, HTML, XML, ... (1) The DFD shows what exists, not what form it takes. (2) Our job always involves changing the format and/or media. (3) describe media and formats in a separate document called a data dictionary (4) note content in as attributes in a separate ERD (below).

        3. Keep DFDs simple by omitting backup, support, and maintenance processes as long as you can. Focus on the operation of the system first.

        Data Flow Analysis of System Development

        In this class we will look at applying systems techniques to the systems work itself. This leads to a model of system development as three parallel processes. One is concerned with understanding the current system plus the latest plans and changes -- call this "Analysis". The second process is concerned with taking ideas from the Analyzes process and designing plans that need implementing. The last process carries out the plan and changes the system.

        [Analysis -> Design -> Implement -> ...]

        Notice we can schedule the above DFD in many ways. We can run the analysis process until it produces an idea, then pass it to the design process, which can modify the plan that triggers implementation sctivity. It all depends on the size of the change to the model and the plan whether we get a traditional or an agile life cycle.

        DFD Smells and Patterns

          Much of the expertise that helps us understand and plan systems is encapsulated in the following hints. They are classified as smells that are to be avoided and patterns that work well enough for repeated use.

          Pattern -- Stores contain a model of reality

          In nearly all systems the purpose of storing data is to capture a picture of some reality. So, name stores after the entities that they model. For example a file containing student records should be shown as a store named "Student".

          Exception: a physical DFD names formats and media and so defines an architecture.

          DFD Smell -- useless storage

          Be suspicious of data stores that have inputs without outputs or have outputs without inputs. Storing something that is never needed is wasteful. Having data that can not be altered or created (no input) is a problem waiting to happen. Example: When I moved office I found I had two filing cabinet drawers full of unread paperwork. I threw it out and plan to not keep it again.

          Exception: there may be some law that requires you to keep some data for a number of years. Find out if this is true.

          DFD Smell -- wasted motion

          Take note of processes that merely move things around in a system, especially when it is data transmitted as paperwork!

          Pattern -- Remove paperwork

          One of the traditional improvements is to replace paperwork data flows and storage by electronic forms.

          DFD Smell -- old technology

          As you abstract away from the current technology to an abstract set of data flows, processes, and stores; take note of processes and storage that use old technology. But don't clutter the DFD! These are candidates for replacement in the new system. Perhaps, when you present your problem to management you could color the old technology red? Don't forget that sometimes an old technology is more reliable than brand new technology: some old ways of doing things need to be preserved.

          DFD Smell -- Overloaded Process

          When data (and stuff) flows through an organization it can pile up in buffer zones. A person's desk can slowly disappear under the incoming paperwork, for example. Look for processes that handle their input slower than it arrives. Even if it can just handle the average rate, queuing theory shows that the length of the queue grows without limit.

          The ideal solution is to have multiple copies of the process running in parallel. Input is distributed to the least loaded or first available server that can run the process. The next solution is to find ways of speeding up the process: better technology, simpler logic, ... Simple examples of this strategy is the faster CPU or enlarged RAM beloved of the PC technophiles. But a subtler variation is reorganizing the data storage to give faster access to the data. This trick includes defragmenting disk drives. A third solution is to provide multiple parallel clones of the process running on multiple processors.

          As an example high traffic web sites may have a dozen web servers and a special load balancing "switching" server front end.

          Note -- multiple computers all running the same process are still a single process in the DFD!

          DFD Smell -- Inefficient, Intractable, and/or Non-computable Processes

          Look out for inefficient processes. These are often concerned with reorganizing data in some way or other. Many times a clever design can make them run a lot more efficiently. Sometimes you can remove the need for the process entirely be rethinking the design. Be aware that a process can often be implemented with many different algorithms and each algorithm will perform differently. You may have to specify an algorithm or give feasible limits on the efficiency of the implementation of a process.

          Computer Science has discovered a large family of processes that can not be programmed. We have also discovered processes that are, as far as we know, need very inefficient programs to solve them. It is worth studying theory to be able to spot these.

          There are also processes that are better done by a human than a machine. Ethical questions should not be handled by machines! Questions needing discretion should involve humans. Sometimes you need to design systems that support communication and cooperation so that complex (political) problems can be resolved by humans.

          DFD Smell -- Under-worked Human

          You will often find systems where an event occurs and triggers a message that is sent to a person, who in return does nothing with the message but pass it on to another part of the system. This smell is worst when the human has a fixed simple procedure that they use to respond to the disturbance. I recently heard of an example a computer system that turned a light on and a human had to hit a button to avoid disaster when the light turned on. The person did nothing and a disaster occurred. This bad design. Another classic example is any web site that expects you to type in data shown to you or even input by you on a previous page!

          A version of this is sending data to a human to re-input later. This introduces errors... there must be a better way to handle the problem. Here is the smelly system and a possible improvement.

          [Output comes straight back in]

          Pattern -- Automate Simple feedback

          If the choice of action in the above system can be computed from the message then a better system automatically carries out the action and reports to the person. The sensing + acting system should only ask for help on the difficult decisions. The best systems allow the person to input and update the desirable actions.

          [Adding a feedback loop to save human work]

          Examples: EMail -- automatically deleting messages that we don't need to see. Inventory -- automatically reorder when stocks get below a certain level. Record people's browsing, let them replay and/or edit the recordings.

          Short summary: let people do the thinking and machines do the boring stuff.

        . . . . . . . . . ( end of section Smells and Patterns) <<Contents | End>>

        UML notations for DFDs.

        The UML is not designed to do DFDs. The designers are more concerned with the details and internals of software than with interactions between parts of a larger system. But in the specification of UML2.0 there is a way to document flows between components:

        [UML DFD Symbols and sample context DFD]

        [UML Level 0 DFD]

        At this time (Fall 2009) it is still better to use a traditional notation like the Gane and Sarson used in these notes.

      . . . . . . . . . ( end of section DFDs -- Data Flow Diagrams) <<Contents | End>>

      UML Data Models

        We need a simple way to explore and design data. This turns out to be a powerful technique in analyzing and designing systems.

        Data is always organized in clumps called records. A record has a collection of items of mostly different data types in it. For example the CMS probably has a record that contains all the information about a student in it. Each type of record tends to reflect a real world Entity. Each type of record is given a meaningful name and this is put in the top compartment of a UML class. These entity names should be in your DFD as well.

        Story -- Sharp Wizard Contacts Data

          I've been using small portable computes as Personal Digital Assistants -- -- --(PDA)
          for a long time. And all have had problems of one kind or another. The Sharp Wizard series, for example, had a very annoying way of handling phone numbers. You couldn't enter the name and number of a person without also inputting the title, rank, department, and organization. Not a bad model for a business person, but very irritating for your mother or spouse. Of course: you had to include the address of the organization as well... People didn't have an address of their own. You had to start top-down inputting the company, the department, and then individuals.

          The Palm Pilots and iPods I've been using for 6 or 7 years have simpler model with each contact having optional data about companies and titles. But don't get me talking about the different models of events and tasks on iPods and Palm Pilots.

        Modeling Entities and Relationships in the UML

        Use UML class diagrams [ uml1.html ] (notes introducing UML for beginners) with no operations to describe data!

        Here is an example based on a project set in a restaurant.

        [Orders are made of items from a single table served by a single waiter]

        The boxes are logical groups of data each referring to a real entity. The lines connecting the boxes are significant relationships`, for example: which Waiter is assigned to one Table. Also notice that this model does not show any attributes (the properties of the entities). It does not show the waiter's name for example. This kind of reduced model -- based on ideas about the real world is sometimes called a Domain Model. They are very useful for planning data bases (later) and for designing object oriented code (CSCI375).

        Each item of data is given a name and a type:

         		name : type
        Examples
         		address : string
         		initial : char
         		age : int
        Notice I used C++ data types... because my audience (you) has taken CS202 and can be expected to understand them. In general, you should use the words of your audience. With multiple audiences put different meanings in a data dictionary as aliases.

        When you first draw these diagrams you can just list the attribute names and jot down more information in a prototype data dictionary. For example here is an UML diagram of the data I found in a class roster.

        [TBA]

        If an item is repeated use square brackets:

         		salary_each_month [12] : money
         		children : Person[*]
         		spouse : Person [0..1]

        When you meet attributes that are actually other entities/records you should connect the boxes with an association.

        If you know of a significant relationship between records/entities then show it as a line (an association) between the boxes. In fact, in some analysis and design methods, you check every pair (and grouping) of entities looking for important relationships between them.

        Mark these relations with multiplicities:

        • Optional: 0..1
        • Many: *
        • One: 1

        Here is an ERD showing the relationships between Questions, Answers, and Comments in the DFD of my Tutoring System (above):

        Questions have many Answers and Answers are associated with Comments.

        Keep Entity-Relation-Diagrams Simple

        Note: the official database notation developed by Chen is too cumbersome for everyday data analysis and design. Use it only when you have to!

        My old student edition of Rational Rose did UML ERDs well! Dia and Visio can also handle them. But the quickest way (after a field trip, say) is on a board or a piece of paper. Keep the edges of the boxes incomplete until done.

        Notice.... that you can just note the relationships without any need for attributes. Here is an example that I drew on my Palm Pilot one day.

        [TBA]

        Sometimes I even omit the boxes:

        [TBA]

        Smell -- Unreal Data

        In an ideal system the data perfectly reflects the reality -- it a "mirror-world". Often, in real systems, the data is often approximate, omits details, and lags behind the real world. When the data has the wrong structure it provides a distorted mirror of the world. But the people in the system may not be aware of this: the file becomes the reality, the computer is the only truth they know.

        Look for lags, errors, missing data, and for misfitting structures when ever you are analyzing a situation or system.

        Normalizing a UML Data Base

          The following procedure improves the design of the data. It exposes logical structures that is implicit in your data.
          1. Draw an ERD of the entities and relationships with attributes inside the boxes.
          2. Extract all attributes marked with [*] are extracted as relations.
          3. Turn all many-to-many and n-ary relations into entities.
          4. Look at 1-to-1 associations: is either (or both) '1's really a '0..1'? If so add the "0.." and treat '0..1' a many '*'. If not, then coalesce the two boxes into one.
          5. All associations end up being many-to-1. Redraw with the 1's above the 'many's

          Simple Normalization in UML

      . . . . . . . . . ( end of section UML Data Models) <<Contents | End>>

      Review Questions

      1. Distinguish physical from logical DFDs.
      2. Name the three types of icon in a DFD.
      3. If there is an arrow from one icon to another in a DFD, what does it mean?
      4. What are the Gane and Sarson icons?
      5. How can you show data flows in the UML2.0?
      6. What is shown on a context DFD of a system? What is not shown?
      7. What is shown in a Level-0 DFD? What is shown in a level-1 DFD?
      8. How do you document the contents of data stores?
      9. How do you document the detailed processing of data?
      10. List the rules that a valid DFD must follow. Then check the list [DFD_Errors] above.
      11. Below is a bad first attempt at the level 0 DFD of my automatic tutoring system. It has many errors. Mark the errors with a big "X" and the number in the list [DFD_Errors] above. For example the "Make Comment" process should be marked X7.

        [Teacher provides Questions and answers and a student answers them ...]

      12. If you discover a person who does no more than reinput some previous output -- how can you improve the system?
      13. Name 6 DFD smells.
      14. Here is a recent example scenario that I experienced
        1. My doctor told the computer system that I needed a certain screening test.
        2. 1 Month later...
        3. My doctor's assistant sent me snail mail asking why I hadn't had the test done.
        4. I phoned the testing center and they told me that I was not eligible.
        5. When I explained why I needed the test they said the doctor would have to reinput the request in a different form that specified the reason for the test. (Note to save money this data must come from the doctor not the patient).
        6. I phoned my doctor and left a message explaining the situation.
        7. The doctor resubmitted the request.

        What smells here? Draw a partial DFD of the situation. Redesign the system to work better.
      15. What is the ultimate reason for storing data in a system?
      16. Is ERD below normalized? If not, show how to normalize it.

        [Question (1)-(*)Answer (*)-(*)Comment]

      Online Exercises on DFDs

      1. Here [ images?hl=en&q=DFD&btnG=Search+Images&gbv=2 ] is a Google search that produces thousands of DFDs! Some of them are very good and some not so good. Look at them, figure out which notation they use. What do you like and/or dislike about some of them.
      2. List some strange ways that information/data is transmitted/stored in an enterprise that you know about.

      3. Take this diagram [ manufacturing.gif ] and redraw it as a DFD -- note: you can treat some money and material flows as data flows.

      Typical Exam Questions and Exercises on DFDs

      1. Draw a context DFD of:
        1. my web site.
        2. CSUSB's current registration and student records system.
        3. CSUSB CSCI web site.
      2. Draw a simple but correct DFD <TBA>
      3. Given a Context DFD draw a plausible and correct level 0 fish-eye DFD.
      4. Given the fish-eye DFD of a system draw its context DFD.
      5. Given a process in a DFD draw a correct and plausible expansion/fish-eye DFD.
      6. Correct a given DFD model.
      7. Answer questions about a given DFD.

      Exercise -- Context Diagram of a Possible Project

      Either to be done in class and/or assigned as out-of-class project work.

    . . . . . . . . . ( end of section Modeling the Data in a System) <<Contents | End>>

    Abbreviations

  1. TBA::="To Be Announced".
  2. TBD::="To Be Done".

    Also see [ glossary.html ] for more special abbreviations and phrases.

End