And here is the reply
Since some students need to sign or accept (they can do the acceptance through MyCoyote Student Grades) the contract before the grade rosters will be available, this feature will be added to the class roster, too.
Our campus requested the Incomplete form as a result of the CSU Student Records Audit that noted we had not received forms for all of the incomplete grades, they were not completed properly when received, and did not have a student signature.....This is considered a contract between the faculty member and the student, so it does require both signatures.
Moral: find out about what data exists, what is needed, and how it can be computed or input.
BY the way, also note the implementation strategy: roll out only some of the functionality at each iteration. Start with a good (but incomplete) system and add to it periodically. We will compare this to some alternatives (Big Bang for example) later.
Story -- Available Data effects Reliability and Usage
But when I could read the rosters through a terminal and copy/paste the data into a spreadsheet the errors almost disappeared.
The latest system (CMS) gives a teacher the option to download his or her roster directly as a spreadsheet. This a very useful feature. It is the fastest and most reliable system I have used.
Similarly, Course Management Systems on this campus (Blackboard, Moodle) also extract CMS data to populate the grading subsystem. Again, having easy access to the data makes the system an improvement over previous systems.
Introduction to data models
System flow charts are a popular and simple yet general model of a system. They
show stuff moving through it, being 'stored' in it, being
'processed' by it, entering, and leaving it. This is a classic
PowerPoint slide! We call the movements
'flows'. Here are some classic flows: goods, money, data, and objects.
All shown as an arrow... which can be confusing.
Computer based systems are almost entirely about handling data. To analyze and design systems that handle data we need a specialized diagram for showing the flow of data. These are called Data Flow Diagrams. We also need specialized diagrams for showing the structure of data. These are called Entity-Relationship-Diagrams
If you want to change a systems it is vital to understand the data in it. The technical feasibility of a new system will often depend on what data is already available. Samples of data (printouts, forms, manual files and records) are a good starting point. So are the descriptions of data in the documentation and source code of any software in the system. But you need to make a more abstract or essential model of two things: the (1) the dynamic flow of the data through the system, and (2) the static structure of the data in the system. To master the complexity of a real domain you need diagrams that just show the essentials: how the data moves, where it is stored, and how different data is related. These are best done by drawing DFDs (Data Flow Diagrams) and simple ERDs (Entity Relationship Diagrams). The details are often described in a Data Dictionary and we will cover these later.
Information Technology is all about delivering information to people. Information is data provided to the people who need it, in their preferred format, at the right time. Information needs to be computed reliably, cheaply, and securely. Tracing the flow of data from source to sink is a vital technique to achieve this aim.
Story -- Go with the flow
When I worked with in the British Civil Service a colleague described
the following meeting. He had been invited to visit a branch and was there
for the day to consult with them about a new computer system they wanted
to develop. They explained that they wanted a program to print out
a 20 page report. Each page had 20 columns and 50 rows....
He and they worked on the content and format of the report
and half-an-hour before lunch they had the whole thing defined and ready
to be programmed. The programming would be done by another team.
Their was, apparently, nothing to do for the next 30 minutes.
So the analyst asked -- "What do you do with the 20 page report?" And a manager replied -- "I look for the row with the largest value in column 17." So my friend asked: "would you like the computer to do that for you?" They replied: "Can a computer do that?" He said "Yes -- and printing one line instead of 1,000 will save money!". They agreed.
So my friend asked: what do you do with the row of data? They told him "We multiply the 2nd column by the 4th column and subtract the 5th column". And he said: "The computer can do that too, if you want." They liked it.
So my friend -- now hot on the track of the end of the data flow -- asked "what do you do with the result you calculated" -- they said "if it is greater than 100 we send a memo to the manager listed in column 1." At last, my friend had found an action... "Could we just send the memo for you and let you know it was sent?"
They then went to lunch at a local pub...
Moral -- always ask where the output data goes to. And contrariwise -- ask where input data comes from.
ERDs and DFDs
We analyze and design
data flows using: external entities(input/source, output/sink), processes,
and stores.
The data flow diagram or DFD is the central diagram used in
information technology. We also analyze the data itself to find out what
its underlying structure is in an ERD.
Once you have a DFD is it useful for pin pointing the changes the enterprise needs to make. You can use DFDs to present the choices to management. They form an excellent start for specifying the hardware and software that will be needed. Meanwhile the static model -- the ERD -- is the starting point for designing a data base and then designing objects inside software.
In summary DFDs and ERDs are a useful intermediate step between problems/opportunities
and solutions/plans.
DFDs -- Data Flow Diagrams
They are good for
Here is an example of a rough pencil and paper DFD:
Each DFD summarizes a collection of simple statements. The above diagram implies
some of the following facts:
Physical and Logical Data Flow Diagrams
A DFD can be used to model the physical or logical structure of a system.
The physical model describes and names
the hardware and software involved. Typically, each process is one program.
Each store is a separate file (think folder in a filing cabinet)
or a table in an existing data base. In other words, physical DFDs
show the architecture of the system not its underlying logic.
In a logical DFD there is no mention of how something is done. No technology is mentioned. Several programs may be inside a single process. One program may implement several processes. Stores are not described in terms of their media (data base, mag tape, disk, RAM,...) but are named for the entities (outside the system) that they store information about (student, teacher, ...).
As a rule you should aim to move to logical DFDs as soon as possible. You can then solve the logical problems in the system without getting confused in the technology. This process produces a top-level design for a new system and is the start for specifying data and programs.
Notations for DFDs
There are three different icons in a DFD: External entity, Process, and
store.
There are several different notations: Yourdon and/or De Marco, Gane & Sarson, SSADM(Structured System Analysis and Design Methodology), and Unified Modeling Language have ways of showing DFDs.
The SSADM notation was developed by the British Civil Service (with LBMS Ltd.) from the Gane and Sarson notation. It is used in England and what used to be the British Commonwealth. As far as I can judge the Gane and Sarson form is most often used notation in the USA. I will use Gane and Sarson and encourage you to do so as well. But different enterprises will use different notations. The Gane and Sarson notation also allows a process box to have three compartments. These are used for: (top) a unique process ID. (middle) description of the function of the process. (bottom) the location where the process is carried out of the actor responsible for the process.
Below I have some
notes
[ UML notations for DFDs ]
that show how the UML is used and
explains why you should, for now, use one of the other
notations rather than the UML.
Semantics of DFDs
Semantics of External Entities in DFDs
External entities are outside the current system. There are sources and sinks.
Sources show how
data that flows into the system from outside. Sinks show where
data leaves the systems. Some entities are both sources and sinks.
We tend to think of entities as being people. But they can be
parts of other systems -- hardware and/or software. The key point is that we
can not redesign external entities. Our system has to fit them.
They are also the main source of disturbances that the system must handle.
We can not control the input from an external source unless we have a process
to handle anything that can happen and sieve out the data that is
needed for our system.
Semantics of Processes in DFDs
Processes are the only active part of a DFD. It is the only place where
results can be computed, data processed and decisions made. Data does not
flow without there being a process to move it.
A process is best thought of as a continuously running program. They
handle whole streams of data. They may wait when the data
is not available but they do not stop. They may repeat the same computation
on each item of data as it arrives. They can make decisions and
route input data to
different outputs. Processes can also wait to be asked for data and
then provide it one their outputs. Try to not see them as steps in an
algorithm -- use an activity diagram (later) for algorithms.
Some processes are subsystems. This helps keep the diagrams of
complex diagrams simple. They are shown as a whole process
in some DFDs. Each is also defined by a DFD. This is called the
refinement of the process. Such processes can contain hidden data stores
and sub-processes. There is a potential tree of refinements.
Semantics of Data Stores in DFDs
Stores are places where data is placed, and where it waits to be
used. Some people use the CRUD mnemonic to describe the interaction between
a process and a data store:
Ultimately the data flows between processes and data stores are (nowadays) programmed using the Structured Query Language --(SQL).
SELECT StudentName FROM Student WHERE Student.id = "123-45-6789"However it is a mistake to go in to this level of detail in a DFD.
On the other hand you should aim to have each data store labeled with the name of a single type of real world object. The data store holds records about all entities of some type or other. The name of the data store should reflect the type of entity. Ultimately they become tables in a database or file.
Traditionally, Creating data in a data store -- adding new items -- is shown by an arrow that flows from a process to a data store. Reading data is indicate by an arrow from the store to the process that needs it. Updates and deletions are shown as two way arrows since data has to be read and then rewritten.
Notice that a data store is needed whenever data is reordered or reorganized. On the other hand if the store is a queue or buffer, so that the first item of data to arrive is the first to be output then we don't show a data store: arrows are understood to be buffered by a queue.
Another note: you can simplify diagrams by putting the same data store
in several places. Traditionally you make stores like this with a
stripe at the left hand end.
Semantics of Arrows in DFDs
The meaning of a data flow (arrow in a DFD) is subtler than you might guess.
It depends on the symbols at each end: process, Entity, or Store.
Notice that only a process can move data. So each data flow must either come from or go to a process. We do not permit data flows to connect entities or stores unless a process is involved.
Connections between processes and entities define the interfaces between the system and its environment. It is rarely unambiguous what data is communicated. Thus these data flows must be described -- at least given a name.
Similarly, it is not clear when you connect one process to another process with an unlabeled arrow what is going on. The arrow needs to be named with the data being transmitted. The name will need further definition (later) in a Data Dictionary. Occasionally you will meet a doubled headed arrow -- here someone has to define the protocol that describes the conversation between the two connected processes.
Notice that in real systems (unlike computer programs) data flows between processes are buffered. One process writes the data and the data waits in a queue until the other process reads it. The writer doesn't have to wait for the data to be taken away. For example when you send me Email it is automatically stored before I read it. Similarly "Snail Mail" is put in my box. Memos, rosters, etc. are all buffered for me. So when Modeling a real system you don't have to say that data in a data flow is in a queue. This buffering is implicit in the the Data Flow model.
A data flow out of a store can only go to a process. It indicates that the process reads the data in the store but does not change it. External entities and stores are not allowed to read data directly -- they must get the data indirectly via a process. However, you don't have to label and document these data flows if the process can read the whole store. You only have to document the data flow from a data store if the process accesses only a part of the store.
A data flow into a store must again come from a process. It indicates any combination of the three basic operations: Create, Update, or Delete. Again if the arrow is unlabeled then it is assumed that the process can (or will) change any item in the store.
A double-headed arrow between a data store and a process indicates that the process may: create, read, delete and update the data in the store. Some omit the arrow heads in this case.
. . . . . . . . . ( end of section Semantics of DFDs) <<Contents | End>>
Drawing DFDs
Keep DFDs simple by keeping them abstract, logical, or essential
-- don't document the media and format
of the information -- just give it a meaningful name. Note: you
can keep a list of the current or planned media/formats in a
"data dictionary". Similarly a DFD should not show the current type
of a part: people, procedures, hardware, and software all tend to be
implementations of processes. The type of a component should be noted in a
data dictionary (see
[ a5.html ]
).
Do DFDs quickly -- pencil and paper, chalk-board. Only tidy them up when some else needs to see them. Use a tool only to impress people. However, even when sketching roughly follow the rules and avoid the errors listed on this page.
Some people put unique short identifiers on each part of a DFD. Avoid this if you can! But in those cases where the boxes are numbered, here are the rules: processes are numbered 1,1.1, 1.2, ... and data stores have an id that starts with "D" plus a number. External entities can be given single lower case letters to be their unique id. These ids are good for linking the same part in different diagrams. For example, the parts numbered 1.1, 1.2, 1.3, etc. are all parts of the process numbered 1. Similarly, 1.2.1, 1.2.3, etc. are subparts of process 1.2.
Never use more than one piece of paper for a DFD. The trick is to have layers of detail. We do this by expanding, exploding, or refining a process into a lower level diagram. This is done by taking a process and drawing a DFD that would replace it in the original DFD. There are three levels of detail commonly needed: context, level-0, and level-1. Here is a picture of how refinement works:
The table shows the three types of DFD and is followed by definitions and examples.
Table
| Level | Content |
|---|---|
| Process Context | Shows one process with its inputs and outputs only. |
| System Context | One process + surrounding external entities |
| Level-0 | Make the central process BIG and draw stores, processes, and flows inside |
| Level-1 | Take a process on the level-0 and repeat the expansion in another DFD |
| Level-n+1 | Take a level n process and refine it. |
A Note on level terminology
I will be following well known textbooks on the naming of the levels. The Wikipedia seems to
use a different form.
Refining a DFD
The process of finding out what is inside a process has many names:
leveling, refinement, filling in the details, partitioning, exploding, decomposing, ... It is an important strategy for analyzing a problem.
Start with the big picture -- the context -- then break it into smaller
and smaller parts.
Ultimately, as you decompose or refine processes, you will find yourself
needing to express logical rules, algorithms, and types of data.
Do not use a DFD to express complex logic, algorithms, or data structures.
Instead, record these details by using techniques introduced later in this
course:
Table
| Processes | Activity diagrams, Use Cases, and Scenarios. Prototypes. |
| External Entities | Persona |
| Data flows | Data dictionary entries and coding techniques. |
| Stores | Entity Relationship Diagrams, Tables, and Normalization |
The above is a top-down procedure. You can also draw rough DFDs of parts of the organization and link them together to get an "end-to-end" model. Here is an example from the first time this course was taught.
These tend to be a little chaotic and unstructured.
Principle -- DFDs are systems not programs
My Law: DFDs are good for recording how a system works. They are
a way of choosing what parts of a system to change and which to
protect. They can be used to define the inputs and outputs to
a program. You can use them to plan a collection of new and
old software (system design). BUT don't use them to design the internals
of a program. You will make errors. There are more modern techniques for
designing programs.
Rules of DFDs -- DFD Errors
Notice and learn the rules below. The
key thought is that data never moves unless a process moves it.
Advice
Notice we can schedule the above DFD in many ways. We can run the analysis process until it produces an idea, then pass it to the design process, which can modify the plan that triggers implementation sctivity. It all depends on the size of the change to the model and the plan whether we get a traditional or an agile life cycle.
Pattern -- Stores contain a model of reality
In nearly all systems the purpose of storing data is to capture a picture
of some reality. So, name stores after the entities that they model. For example
a file containing student records should be shown as a store named "Student".
Exception: a physical DFD names formats and media and so defines an architecture.
DFD Smell -- useless storage
Be suspicious of data stores that have inputs without outputs or have
outputs without inputs. Storing something that is never needed is wasteful.
Having data that can not be altered or created (no input) is a problem waiting to
happen. Example: When I moved office I found I had two filing cabinet drawers
full of unread paperwork. I threw it out and plan to not keep it again.
Exception: there may be some law that requires you to keep some data for
a number of years. Find out if this is true.
DFD Smell -- wasted motion
Take note of processes that merely move things around in a system,
especially when it is data transmitted as paperwork!
Pattern -- Remove paperwork
One of the traditional improvements is to replace paperwork data flows and storage by electronic forms.
DFD Smell -- old technology
As you abstract away from the current technology to an abstract set
of data flows, processes, and stores; take note of processes and
storage that use old technology. But don't clutter the DFD!
These
are candidates for replacement in the new system. Perhaps, when you
present your problem to management you could color the old technology
red?
Don't forget that sometimes an old technology is more reliable than brand new
technology: some old ways of doing things need to be preserved.
DFD Smell -- Overloaded Process
When data (and stuff) flows through an organization it can pile up
in buffer zones. A person's desk can slowly disappear under the incoming
paperwork, for example. Look for processes that handle their
input slower than it arrives. Even if it can just handle the average
rate, queuing theory shows that the length of the queue grows without
limit.
The ideal solution is to have multiple copies of the process running in parallel. Input is distributed to the least loaded or first available server that can run the process. The next solution is to find ways of speeding up the process: better technology, simpler logic, ... Simple examples of this strategy is the faster CPU or enlarged RAM beloved of the PC technophiles. But a subtler variation is reorganizing the data storage to give faster access to the data. This trick includes defragmenting disk drives. A third solution is to provide multiple parallel clones of the process running on multiple processors.
As an example high traffic web sites may have a dozen web servers and a special load balancing "switching" server front end.
Note -- multiple computers all running the same process are still a single process in the DFD!
DFD Smell -- Inefficient, Intractable, and/or Non-computable Processes
Look out for inefficient processes. These are often concerned with
reorganizing data in some way or other. Many times a clever design
can make them run a lot more efficiently. Sometimes you can remove the
need for the process entirely be rethinking the design.
Be aware that a process can often be implemented with many different
algorithms and each algorithm will perform differently. You may have to
specify
an algorithm or give feasible limits on the efficiency of the implementation
of a process.
Computer Science has discovered a large family of processes that can not be programmed. We have also discovered processes that are, as far as we know, need very inefficient programs to solve them. It is worth studying theory to be able to spot these.
There are also processes that are better done by a human than a machine. Ethical questions should not be handled by machines! Questions needing discretion should involve humans. Sometimes you need to design systems that support communication and cooperation so that complex (political) problems can be resolved by humans.
DFD Smell -- Under-worked Human
You will often find systems where an event
occurs and triggers a message that is sent to a person, who
in return does nothing with the message but pass it on to another part of the system.
This smell is worst when the human has a fixed
simple procedure that they use to respond to the disturbance. I recently heard of
an example a computer system that turned a light on and a human had to hit a button
to avoid disaster when the light turned on. The person did nothing and a disaster occurred.
This bad design. Another classic example is any web site that expects you to type in data
shown to you or even input by you on a previous page!
A version of this is sending data to a human to re-input later. This introduces errors... there must be a better way to handle the problem. Here is the smelly system and a possible improvement.
Pattern -- Automate Simple feedback
If the choice of action in the above system can be computed from
the message then a better system automatically carries out the action and
reports to the person. The sensing + acting system should
only ask for help on the difficult decisions.
The best systems allow the person to
input and update the desirable actions.
Examples: EMail -- automatically deleting messages that we don't need to see. Inventory -- automatically reorder when stocks get below a certain level. Record people's browsing, let them replay and/or edit the recordings.
Short summary: let people do the thinking and machines do the boring stuff.
. . . . . . . . . ( end of section Smells and Patterns) <<Contents | End>>
UML notations for DFDs.
The UML is not designed to do DFDs. The designers are more concerned
with the details and internals of software than with interactions between
parts of a larger system. But in the specification of UML2.0 there is a way
to document flows between components:
At this time (Fall 2009) it is still better to use a traditional notation like the Gane and Sarson used in these notes.
. . . . . . . . . ( end of section DFDs -- Data Flow Diagrams) <<Contents | End>>
UML Data Models
Data is always organized in clumps called records. A record has a collection of items of mostly different data types in it. For example the CMS probably has a record that contains all the information about a student in it. Each type of record tends to reflect a real world Entity. Each type of record is given a meaningful name and this is put in the top compartment of a UML class. These entity names should be in your DFD as well.
Story -- Sharp Wizard Contacts Data
The Palm Pilots and iPods I've been using for 6 or 7 years have simpler model with each contact having optional data about companies and titles. But don't get me talking about the different models of events and tasks on iPods and Palm Pilots.
Modeling Entities and Relationships in the UML
Use UML class diagrams
[ uml1.html ]
(notes introducing UML for beginners)
with no operations to describe data!
Here is an example based on a project set in a restaurant.
The boxes are logical groups of data each referring to a real entity. The lines connecting the boxes are significant relationships`, for example: which Waiter is assigned to one Table. Also notice that this model does not show any attributes (the properties of the entities). It does not show the waiter's name for example. This kind of reduced model -- based on ideas about the real world is sometimes called a Domain Model. They are very useful for planning data bases (later) and for designing object oriented code (CSCI375).
Each item of data is given a name and a type:
name : typeExamples
address : string
initial : char
age : intNotice I used C++ data types... because my audience (you) has taken CS202 and can be expected to understand them. In general, you should use the words of your audience. With multiple audiences put different meanings in a data dictionary as aliases.
When you first draw these diagrams you can just list the attribute names and jot down more information in a prototype data dictionary. For example here is an UML diagram of the data I found in a class roster.
If an item is repeated use square brackets:
salary_each_month [12] : money
children : Person[*]
spouse : Person [0..1]
When you meet attributes that are actually other entities/records you should connect the boxes with an association.
If you know of a significant relationship between records/entities then show it as a line (an association) between the boxes. In fact, in some analysis and design methods, you check every pair (and grouping) of entities looking for important relationships between them.
Mark these relations with multiplicities:
Here is an ERD showing the relationships between Questions, Answers, and Comments in the DFD of my Tutoring System (above):
Keep Entity-Relation-Diagrams Simple
Note: the official database notation developed by Chen is too cumbersome
for everyday data analysis and design. Use it only when you have to!
My old student edition of Rational Rose did UML ERDs well! Dia and Visio can also handle them. But the quickest way (after a field trip, say) is on a board or a piece of paper. Keep the edges of the boxes incomplete until done.
Notice.... that you can just note the relationships without any need for attributes. Here is an example that I drew on my Palm Pilot one day.
Sometimes I even omit the boxes:
Smell -- Unreal Data
In an ideal system the data perfectly reflects the reality -- it a
"mirror-world".
Often, in real systems, the data
is often approximate, omits details, and lags behind
the real world. When the data has the wrong structure it provides
a distorted mirror of the world.
But the people in the system may not be aware of this: the
file becomes the reality, the computer is the only truth they know.
Look for lags, errors, missing data, and for misfitting structures when ever you are analyzing a situation or system.
. . . . . . . . . ( end of section UML Data Models) <<Contents | End>>
Review Questions
. . . . . . . . . ( end of section Modeling the Data in a System) <<Contents | End>>
Abbreviations
Also see [ glossary.html ] for more special abbreviations and phrases.