[Skip Navigation] [Remove Frame] [CS320] [Text Version] hours.html Sat Dec 23 08:00:03 PST 2006


    Mathematics in Action: What jobs have been running too long


      To show that an ounce of math is worth a pound of random programming.


      As part of my task of keeping an FTP server running I need to be able to terminate FTP sessions that have been running for more than a certain number of hours - which I call the 'grace period'. I can get a list of these sessions with either the date or the time when they started in particular position. If there is a date then the session is more than 24 hours long and needs killing. When there is a start time then this is a number between 0 and 23 inclusive and indicates the time on a twenty-four hour clock. I can also get hold of the time now on the same twenty-four hour clock. All these times are whole numbers: 8 means 8 in the morning and 20 means 8 in the evening. 23 means 11 at night and 0 stands for midnight.

      My problem is to get a function that tells me whether or not a the difference between to 24-hour times is greater than a given grace period - when the times are available only as 24-hour times.


      When the grace period is 4 hours and a job started at 4 (24 hour clock time) then it needs to be killed when the 24 hour time is 8. If it starts at 23 (on the 24 hour clock) then it needs to be stopped at 3.

      Quality Requirement

      We do not want to be sued for stopping jobs before the announced time. We can not afford to let jobs accumulate.... the system crashes.


      This a real problem.

      Formal Analysis

      Use C.expressions.


    1. s::int=the start time,
    2. S::0..23=the start time on the 24 hour clock.
    3. t::int=the time now,
    4. T::0..23=the time on the 24 hour clock.
    5. g::0..23=the grace period in hours.


    6. D::int=T-S, we want to determine if
    7. d::int=t-s is
    8. d > g or d<=g.

      We will use the C/C++ notation for remainder or modulus:

    9. For int i,j>0, i%j::=i - j*(i/j). We can show that
    10. (above)|- (r0): For i,j>0, i=i%j+j*(i/j).
    11. (above)|- (r1): For i,j>0, for some k(i=k*j+i%j)).

    12. (the|-symbol is used to indicate that a formaula has been asserted to be true)

      We have been told:

    13. |- (1): T = t % 24
    14. |- (2): S = s % 24
    15. |- (3): 0 <= t-s and t-s < 24.

      So we deduce:

    16. (1, 2)|- (4): (T-S)%24 = (t-s)%24.
    17. (4, r1)|- (5): for some k, d = D + 24*k
    18. (1, 2)|- (6): 0 <= T and S < 24,
    19. (6, 3)|- (7): d in { D, D+24, D+48, ...} and 0..23
    20. = { D, D+24 }.


    21. (above)|- (D0): D<0 or D=0 or D>0.
    22. (7, 5)|- (11): if D=0 then d<g.
    23. (6, 7)|- (15): if D<0 then ( d<g iff D+24<g ).
    24. (7)|- (18): if D>0 then (d<g iff D<g).

    25. (D0, 11, 15, 18)|- (20): if D<0 then (d<g iff D+24<g) else (d<g iff D<g).

      C and C++ Code

      We would therefore use the following function
      	int (int T, int S, int g)
      		if (T-S < 0)
      			return T-S+24 < g;
      			return T-S < g;

      Actual Implementation

      The "ftp.hangman" program is a shell script:
      	: Hang up day old FTP sessions.  Uses BSD -u option, RTFM
      	ps -axu >$tmp
      	grep "^ftp" <$tmp |
      	awk '$9!~/:/{print $2}' |
      	while read pid; do kill -9 $pid; done
      	rm $tmp

      This works because the set of running jobs is listed in tmp by the ps -axu command. The first "word" on a line is the owner of the job, and so grep "^ftp" extracts only the FTP processes. The ps -axu format puts the time or month as the 9th word on the line and the process Id as the second word. Hence the awk command selects the 24 or more hour old processes and outputs their process Identifiers. This set of Ids (the old ftp processes) are read in, one at a time by the while - do - done loop and killed.

      Clearly the only change is check for the presence of a ":" in the 9th word, extract the hour (when it started) and compare:

        awk '$9!~/:/{print $2}
      		if(S-T>=g || S-T<0 && (S-T+24)>g)print $2;
      	      }' g=$grace T=`date +%H`


      There is a hopeful theory that a program that is properly developed does not need testing. Given the unprovable state of most software tools this is a dangerous theory. In this case testing showed up a bug caused by the weak typing in 'awk'. The correct statement to select outstanding jobs is:
                if(S-T>=0+g || S-T<0 && (S-T+24)>0+g)print $2;
      since g is interpreted as a string and so lexographic rather than numeric ordering is used in the comparisons.


      Notice that the condition was not wrong. It is the coding that fell into a well known and avoidable trap.

    . . . . . . . . . ( end of section What jobs have been running too long) <<Contents | End>>

    Proof of 5

    Substitute D for T-S, and d for t-s.
  1. (above)|- (5.1): D%24=d%24,
  2. (r1, 5.1, ei, k=>k1, ei, k=>k2)|- (5.2): D+k1*24=d+24*k2,
  3. (5.2)|- (5.3): D= -k1*24+d+24*k2,
  4. (5.3)|- (5.4): D= d+24*(k2-k1),
  5. (5.4, eg, k=>k2-k1)|- (5.5): D= d+24*k,

    Proof of 11


    1. |- (8): D=0.
    2. (8, 7, 3)|- (9): d=0,
    3. (9)|- (10): d<g.

    (Close Let )

    Proof of 15


    1. |- (12): D<0,
    2. (12, 6)|- (13): D> -24,
    3. (13, 7)|- (14): d = D+24,

    (Close Let )

    Proof of 18


    1. |- (16): D>0.
    2. (16, 7)|- (17): d=D.

    (Close Let )


    (ei): The "existential instanciation" rule of inference [ ei in logic_2_Proofs ]
    (eg): The "existential generalization" rule of inference [ eg in logic_2_Proofs ]