poseidon-dev/TestCalcZ2.Rmd at master · Bai-Li-NOAA/poseidon-dev · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Getting mortality from atlantisom: testing calc_Z function II"
author: "Sarah Gaichas and Christine Stawitz"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)

```

## Introduction

This page documents initial testing of the atlantisom package in development at https://github.com/r4atlantis/atlantisom using three different [Atlantis](https://research.csiro.au/atlantis/) output datasets. Development of atlantisom began at the [2015 Atlantis Summit](https://research.csiro.au/atlantis/atlantis-summit/) in Honolulu, Hawaii, USA.
On this page we demonstrate use of atlantisom on both California Current (CCA) and Norwegian Barents Sea Atlantis (NOBA) output files to test function `calc_Z` calculating total mortality. This is used both to derive "natural mortality" for comparison with stock assessments and to split age classes into true ages using the function `calc_stage2age`.

NOBA has output files in true ages that we can compare results with. These files are in the folder NOBAwithAnnAgeOutput (not kept on github due to large filesize).

We can also compare both models with the output[scenario.name]Mort.txt file, which reports annual M and F. The user's manual notes:
>This file is currently only useful for looking at relative M vs F values for a species, as itdoes not give accurate mortalities

So we can assume that the scale of Z from this file will not be the same as our calculation, but it should be useful for splitting out F to get M, and for comparing overall patterns.

We will first use CCA here because the contrasting F scenario has been implemented. We will test with both sardines (M needed, but not age splitting) and hake (M and age splitting needed).

All model setup and configuration is described [here](https://sgaichas.github.io/poseidon-dev/FullSardineTruthEx.html).

```{r message=FALSE, warning=FALSE}
library(tidyr)
require(dplyr)
library(ggplot2)
library(data.table)
library(here)
library(ggforce)
library(ggthemes)
library(atlantisom)
```

```{r initialize}

initCCA <- TRUE
initNEUS <- FALSE
initNOBA <- FALSE

species_ss <- c("Pacific_sardine")

#function to make a config file? need one for each atlantis run

if(initCCA) source(here("config/CC2Config.R"))

if(initNEUS) source(here("config/NEUSConfig.R"))

if(initNOBA) source(here("config/NOBAaaConfig.R"))

```

```{r get_names, message=FALSE, warning=FALSE}
#Load functional groups
funct.groups <- load_fgs(dir=d.name,
                         file_fgs = functional.groups.file)
#Get just the names of active functional groups
funct.group.names <- funct.groups %>%
  filter(IsTurnedOn == 1) %>%
  select(Name) %>%
  .$Name

```

```{r get_truth, message=FALSE, warning=FALSE}

# default run_truth setup will save the file, so check for that first

if(!file.exists(file.path(d.name,
                          paste0("output", scenario.name, "run_truth.RData")))){
  #Store all loaded results into an R object
  truth <- run_truth(scenario = scenario.name,
                     dir = d.name,
                     file_fgs = functional.groups.file,
                     file_bgm = box.file,
                     select_groups = funct.group.names,
                     file_init = initial.conditions.file,
                     file_biolprm = biol.prm.file,
                     file_runprm = run.prm.file
  )
} else{
  truth <- get(load(file.path(d.name,
                              paste0("output", scenario.name, "run_truth.RData"))))
}
```

<!-- not sure we need catch bio for this
```{r loadcatchb}

truecatchbio <- load_catch(d.name, file_catch = catch.file, fgs = funct.groups)

truecatchbio_ss <- truecatchbio[truecatchbio$species == species_ss,]
# note: time output of this file is in days
# what model timestep is this output matching?
# rescale catch in biomass time from days to years
truecatchbio_ss$time <- as.integer(truecatchbio_ss$time/365)

```
-->

## Test calc Z function

This section tests the `atlantisom::calc_Z()` function step by step. Ultimately this output goes to `calc_stage2age` to produce annual numbers at true age, which we can compare with the ANNAGEBIO.nc output produced by NOBA. For now we will just compare to the Mort.txt output.

```{r census-spec, message=FALSE, warning=FALSE}

# make a function for this
source(here("config/census_spec.R"))

```

Output timestep toutinc for the population is `r runpar$toutinc `,

so steps per year in run_truth output is `r stepperyr `

and the number of output steps in run_truth output is `r noutsteps `.

### True numbers total (annual)

This gives us the true total numbers on an annual basis. We can compare this with age specific outputs.

```{r surveyNbased, eval=FALSE}

# this uses result$nums and a new function to get survey index in numbers (abundance)

survey_testNall <- create_survey(dat = truth$nums,
                                 time = timeall,
                                 species = survspp,
                                 boxes = boxall,
                                 effic = effic1,
                                 selex = selex1)

# save this, needed as input to calc_Z?
saveRDS(survey_testNall, file.path(d.name, paste0(scenario.name, "survey_testNall.rds")))

# as above, make up a constant 0 cv for testing
surv_cv <- data.frame(species=survspp, cv=rep(0.0,length(survspp)))

surveyN <- sample_survey_numbers(survey_testNall, surv_cv)

#save for later use, takes a long time to generate
saveRDS(surveyN, file.path(d.name, paste0(scenario.name, "surveyNcensus.rds")))

```

```{r survNtrue-plot}

surveyN <- readRDS(file.path(d.name, paste0(scenario.name, "surveyNcensus.rds")))

surveyN_ss <- surveyN[surveyN$species == species_ss,]

plotN <-ggplot() +
  geom_line(data=surveyN_ss, aes(x=time/stepperyr,y=atoutput, color="survey census N"),
            alpha = 10/10) +
  theme_tufte() +
  theme(legend.position = "top") +
  labs(colour=scenario.name)

plotN +
  facet_wrap(~species, scales="free")

```


### What level of aggregation works for calc_Z?

The atlantisom function `calc_stage2age` calls the `calc_Z` function, applying a total mortality rate to estimate the numbers at true age within an age class, but it is unclear whether to use this on the aggregated numbers or the full resolution numbers at age. Here we test outputs of Z for several levels of aggregation:

```{r testZagg}
# add YOY file to the config files
YOY <- load_yoy(d.name, paste0("output", scenario.name, "YOY.txt"))

# load biolprm in some initialize file?
biol <- load_biolprm(d.name, biol.prm.file)

# get code matching species name to split YOY file
code_ss <- funct.groups$Code[which(funct.groups$Name == species_ss)]

# cut to a single species in YOY file
YOY_ss <- YOY %>%
  select(Time, paste0(code_ss, ".0"))

# numbers at agecl at full resolution (all polygons and layers)
truenums_ss <- truth$nums[truth$nums$species == species_ss,]

survey_testNall <- readRDS(file.path(d.name, paste0(scenario.name, "survey_testNall.rds")))

# numbers at agecl aggregated over layers, retain polygons (output of create_survey)
survey_testN_ss <- survey_testNall[survey_testNall$species == species_ss,]

# numbers at agecl aggregated over layers and polygons
Natage_ss <- readRDS(file.path(d.name, paste0(scenario.name, "natage_census_sard.rds")))

#calc_Z <- function(yoy, nums, fgs, biolprm, toutinc)

fullresZ <- calc_Z(yoy = YOY_ss,
                   nums = truenums_ss,
                   fgs = funct.groups,
                   biolprm = biol,
                   toutinc = runpar$toutinc)

surveyresZ <- calc_Z(yoy = YOY_ss,
                   nums = survey_testN_ss,
                   fgs = funct.groups,
                   biolprm = biol,
                   toutinc = runpar$toutinc)

sampleresZ <- calc_Z(yoy = YOY_ss,
                   nums = Natage_ss,
                   fgs = funct.groups,
                   biolprm = biol,
                   toutinc = runpar$toutinc)


```

Fixed a bug where grep was used if a # appeared. This ran for the survey resolution numbers. I then went back and ran the two other levels above after fixing `calc_Z.`

Both true full resolution numbers and survey resolution numbers (aggregated over polygon) returned nonzero Z values. As with previous the NOBA test, they are identical aside from eight values with no apparent pattern (rows 100, 105, 180, 200, 220, 240, 355, 400) out of 500 that differ at e-16. Same story with difference from sample resolution; this does not matter to output.

```{r differentZ}
fullresZ$atoutput - surveyresZ$atoutput

fullresZ$atoutput - sampleresZ$atoutput
```

I would be happer if the differences were all exactly 0, but I suppose we can attribute that to rounding error.

### What is F at each timestep?
A relative annual F is found in outputMort.txt!
```{r getrelF}

file.mort <- file.path(d.name, paste0("output", scenario.name, "Mort.txt"))

mortish <- read.table(file.mort, header = TRUE)

relF_ss <- mortish %>%
  select(Time, relF = paste0(code_ss, ".F"))

relM_ss <- mortish %>%
  select(Time, relM = paste0(code_ss, ".M"))

plotF <- ggplot(data=relF_ss, aes(x=Time/365, y=relF)) +
  geom_line() +
  theme_tufte() +
  labs(subtitle = paste(scenario.name, species_ss))

plotF

```

This is consistent with the scenario implemented. So far so good.

This probably depends on how the model is set up, but for models forced with F by species we ought to be able to read this out from an input file. If it is a daily rate then we mulitply by toutinc to get the rate at each of these output timestep Z values.

### Quick test to see if Z is right?
Compare with outputMort.txt!
```{r compZ}

Zish <- merge(relF_ss,relM_ss) %>%
  mutate(relZ = relF + relM)

plotZ <- ggplot() +
  geom_line(data=Zish, aes(x=Time/365, y=relZ)) +
  geom_point(data=fullresZ, aes(x=(time*73)/365, y=atoutput)) +
  theme_tufte() +
  labs(subtitle = paste(scenario.name, species_ss))

plotZ
```

After several iterations of debugging, we are still not matching the pattern, but improved in that it isn't a constant punctuated by occasional high. Variation still doesn't look anything like the txt file either. The 5x yr timesteps are probably messing this up?

I've commented out the detailed debugging block below, but check out the rmd file if you are interested in the sausage making. Changes have been implemented in `calc_Z` so that it now apportions annual recruitment to timesteps and recruitment is correctly aligned with truth nums output.

However, it is not clear that subtracting off "recruitment" that has not been subject to mortality is appropriate when the output numbers at each timestep have been subject to mortality? I guess that was the input in each timestep as best we know it, but this is at best an approximation.
<!--
```{r calczsnippet, eval=FALSE}

#corrections listed here have been applied in calc_Z; this code not run

# test with one species at survey scale (layers aggregated, not time or polygon)
yoy <- YOY_ss
nums <- survey_testN_ss
fgs <- funct.groups
biolprm <- biol
toutinc <- runpar$toutinc

# everything below the function definition line
  # subset the yoy for species included in the fgs file
  # that are turned on
  # colnames of the recruit data are "Time" and a column for each
  # species where the name is the functional group with an additional .0
  turnedon <- fgs[fgs$IsTurnedOn > 0, ]
  recruits <- yoy[, colnames(yoy) %in% c("Time", paste0(turnedon$Code, ".0"))]

  # mg carbon converted to wet weight in tonnes
  k_wetdry <- biolprm$kgw2d / 1000000000
  # Sum of structural and reserve nitrogen (KWSR_RN + KWSR_SN)
  nitro <- merge(biolprm$kswr, biolprm$kwrr, by = "1")
  nitro$sum <- apply(nitro[, 2:3], 1, sum)

  # legacy: If output is from legacy code there will be an error in the
  # yoy data, where the first row is in a different unit.
  # yoy.txt is the biomass in tonnes per spawning event summed over the total
  # model domain.
  # The first row (< Nov/Dec 2015) is stored as biomass and the remaining rows
  # are stored in numbers, must convert the entire matrix to biomass
  # Check if legacy code and if so convert the numbers to biomass

  # G.Fay 2/21/16 : changed yoy to recruits in below loop.
  if (abs(recruits[1, 2] / recruits[2, 2]) > 10) {
    recruits[2:NROW(recruits), 2:NCOL(recruits)] <- recruits[2:NROW(recruits), 2:NCOL(recruits)] *
    nitro[match(gsub(".0", "", colnames(recruits)[-1]), nitro[, 1]), "sum"]
  }


  # Wide to long
  recruits <- reshape(data = recruits, direction = "long",
    varying = colnames(recruits)[-1],
    v.names = "recruits",
    times = colnames(recruits)[-1],
    timevar = "group")
  rownames(recruits) <- 1:NROW(recruits)
  recruits <- recruits[, -which(colnames(recruits) == "id")]
  # Switch from species code to species
  recruits$group <- gsub("\\.0", "", recruits$group)
  recruits <- merge(recruits, fgs[, c("Code", "Name")],
    by.x = "group", by.y = "Code")

  # merge recruits with strucn and resn of recruits from biol.prm file
  recruits <- merge(recruits, nitro[, c(1, 4)],
    by.x = "group", by.y = "1", all.x = TRUE, all.y = FALSE)
  colnames(recruits)[which(colnames(recruits) == "recruits")] <- "recruitsbio"
  # Get recruits in numbers rather than biomass
  # these match rows 2:end of the legacy file, so good
  # but is this 1000s or numbers? not real numbers, Gavin corrects below
  recruits$recruits <- recruits$recruitsbio / recruits$sum
  recruits$yr <- as.integer(round(recruits$Time)/365)

  #Isaac determined that yr 0 (Time 0) in the YOY file is not real, don't use it
  #below assigns year 1 to times 0:stepperyr, all good output timesteps in truth$nums

  # G.Fay 2/21/16
  # UGLY code below tries to align fraction of annual yoy with timing of recruitment
  # values in YOY.txt are total YOY that year waiting to recruit.
  # seems to get rid of most of 'issues' - some v.minor survival >1,
  #perhaps due to averaging of survival over toutinc days
  nyrs <- ceiling(max(yoy$Time)/365)
  times <- unique(yoy$Time)

  # SKG June 2019
  # need to match codes below, this mismatches when biolprm not in same order!!
  #recstart_temp <- biolprm$recruit_time
  #recstart_temp[,2] <- biolprm$time_spawn[-(grep('#',
  #                     biolprm$time_spawn[,1])),2] + biolprm$recruit_time[,2]
  # was this grep to get rid of a species with #XXX? breaks when there are none
  #recstart_temp[,2] <- biolprm$time_spawn[,2] + biolprm$recruit_time[,2]

  #recstart_temp <- recstart_temp[recstart_temp[,1]%in%turnedon$Code,] #bug added codes with no YOY output
  #recstart_temp <- recstart_temp[recstart_temp[,1]%in%recruits$group,]

  # need spawn_period?? if so read out in load_biolprm and merge it here too

  # rectiming is a dataframe with species code, time_spawn (day of year),
  # recruit_time (number of days), and recruit period (number of days).
  # we use these to calculate recstart (day of year) and recend (day of year)
  rectiming <- merge(biolprm$time_spawn, biolprm$recruit_time, by = 1)
  rectiming <- merge(rectiming, biolprm$recruit_period, by = 1)
  names(rectiming) <- c("Code", "time_spawn", "recruit_time", "recruit_period")
  rectiming <- rectiming %>%
    mutate(recstart = time_spawn + recruit_time) %>% # possibly + spawn_period
    mutate(recend = recstart + recruit_period)

  #subsets to groups of interest
  recstart_temp <- rectiming[rectiming$Code %in% recruits$group,]

  # do we need this fraction calculation if YOY is annual snapshot?
  # (check output timing of YOY file with tsumout, available in run.prm)
  # can we not just allocate the full YOY to the timestep where they come in?
  # if recstart to recend spans multiple toutinc timesteps then need fraction

  # align model output timesteps (days) with recruitment periods (days)
  #numstime.days <- unique(nums$time)*toutinc
  # Sum numbers output over all boxes/depth/cohorts
  totnums <- aggregate(atoutput ~ species + time, data = nums, sum) %>%
    mutate(time.days = (time+1)*toutinc) %>% #makes time 0 into days 0->73, etc
    mutate(yr = ceiling(time.days/365))  # yr 1 is 0:stepsperyr to match recruits yr1
    #mutate(yr = floor(time.days/365)) # includes 0 value, yr 1 is

  totnums <- merge(totnums, recruits,
    by.x = c("yr", "species"), by.y = c("yr", "Name"),
    all.x = TRUE) %>%
    arrange(time)

  totnums$frac_recruit <- 0

  for (irow in 1:nrow(recstart_temp)) {
    group <- recstart_temp$Code[irow]
    pick <- which(totnums$group == group)

    recstart <- seq(recstart_temp$recstart[irow],by=365,length.out=nyrs)
    recstart <- recstart[recstart<max(totnums$Time[pick])]
    recend <- recstart + recstart_temp$recend[irow]
    #rec_times <- rbind(rec_times,cbind(group,recstart,recend))

    for (i_rec in 1:length(recstart)) {
      i_tstart <- which(pick==min(pick[totnums$time.days[pick]>=recstart[i_rec]]))
      i_tstop <- which(pick==min(pick[totnums$time.days[pick]>=recend[i_rec]]))
      if (i_rec == length(recstart)) i_tstop <- length(pick)
      n_t <- 1+i_tstop-i_tstart
      for (i_t in 1:n_t) {
        t_temp <- totnums$time.days[pick[i_tstart+i_t-1]]
        num_temp <- t_temp - recstart[i_rec]
        if (i_t>1) {
          if ((recend[i_rec]-t_temp)>toutinc) {
            num_temp <- toutinc
          }
          else {
            num_temp <- recend[i_rec]-(totnums$time.days[pick[i_tstart+i_t-2]])
          }
        }
        frac_temp <- max(c(0,num_temp /(recend[i_rec]-recstart[i_rec])))
        totnums$frac_recruit[pick[i_tstart+i_t-1]] <- frac_temp
      }
    }
  }

# Time in YOY does not match recstart-recend period still after this loop--fixed

  # match "Time" of the young of the year with the time-step periodicity
  # listed in the run.prm or run.xml file
  #recruits$Time <- recruits$Time / toutinc

# SKG did this above already in the following line?
  # Get recruits in numbers rather than biomass
  #recruits$recruits <- recruits$recruitsbio / recruits$sum

  #G.Fay 1/6/16, expand to num of recruits
  # Recruit / mg C converted to wet weight in tonnes / redfield ratio of C:N
  totnums$recruits <- totnums$recruits/k_wetdry/biolprm$redfieldcn


  # Combine recruits and numbers
  # Only pull recruits from the yearly time step, where the
  # yearly time step matches the time step
  # totnums <- merge(totnums, recruits,
  #   by.x = c("time", "species"), by.y = c("Time", "Name"),
  #   all.x = TRUE, all.y = FALSE)
  totnums$group <- recruits$group[match(totnums$species, recruits$Name)]
  # For all time increments where there
  totnums$recruits[is.na(totnums$recruits)] <- 0

   # Make sure time is in order, did above
  #totnums <- totnums[order(totnums$species, totnums$time), ]
  totnums$annrecruits <- totnums$recruits

  totnums$recruits <- totnums$annrecruits*totnums$frac_recruit

  # Calculate survivors for each species group
  totnums$survivors <- totnums$recruits

  totnums$survival <- totnums$survivors
  # Calculate survival for each group
  for (group in unique(totnums$group)) {
    #if(group == "SHD") browser()
    pick <- which(totnums$group == group)
    survival_temp <- c(NA,
      totnums$survivors[pick[-1]]/totnums$atoutput[pick[-length(pick)]])

    # G.Fay 2/21/16  "think" this is what things should be, recruits don't show up
    # in numbers at age until time step after the recruitment event.
     survival_temp <- c(
       (totnums$atoutput[pick[-1]]-
          totnums$recruits[pick[-1]])/totnums$atoutput[pick[-length(pick)]],NA)

#     survival_temp <- c(
#       (totnums$atoutput[pick[-1]])/
#          (totnums$recruits[pick[-1]]+totnums$atoutput[pick[-length(pick)]]),NA)

    survival_temp[survival_temp < 0] <- NA
    # Use first positive value to replace the initial year and all negative vals
    firstgood <- which(!is.na(survival_temp))[1]

    survival_temp[1:firstgood] <- survival_temp[firstgood]
    for(ii in seq_along(survival_temp)) {
      if (is.na(survival_temp[ii])) {
        nonzero <- which(which(survival_temp > 0) > ii)
        if (length(nonzero) == 0) nonzero <- which(survival_temp > 0)
        survival_temp[ii] <- survival_temp[which.min(abs(nonzero - ii))]
      }
    }
    totnums$survival[pick] <- survival_temp
   }

  #Calculate Z
  totnums$Z <- -1 * log(totnums$survival)
  finaldata <- data.frame("species" = totnums$species,
    "agecl" = NA, "polygon" = NA, "layer" = NA,
    "time" = totnums$time, "atoutput" = totnums$Z)

surveyresZ <- finaldata


```
-->

### How to turn timestep Z into annual Z for proper comparison?

Two approaches: annual nums each year, calculate survival between years (minus annual recruitment), turn into annual Z, or add timestep Zs to get annual?

First an annual estimate based on end year numbers snapshots:

```{r annZ1}

totnums <- aggregate(atoutput ~ species + time, data = truenums_ss, sum) %>%
    mutate(time.days = (time+1)*runpar$toutinc) %>% #makes time 0 into days 0->73, etc
    mutate(yr = ceiling(time.days/365))  # yr 1 is 0:stepsperyr to match recruits yr1
    #mutate(yr = floor(time.days/365)) # includes 0 value, yr 1 is

#only want numbers at end of year, snapshot
totnumsann <- totnums %>%
  filter(time.days %in% seq(365, max(time.days), by=365))

# mg carbon converted to wet weight in tonnes
k_wetdry <- biol$kgw2d / 1000000000

# WARNING only works for CCA because YOY.txt rows 2:end are already in numbers
# and we are merging out the incorrect and irrelevant YOY row 1 (Time=0)
# also hardcoded for sardine example
recnums <- YOY_ss %>%
  mutate(yr = as.integer(round(YOY_ss$Time)/365)) %>%
  mutate(recnums = SAR.0/k_wetdry/biol$redfieldcn)

annZ1 <- merge(totnumsann, recnums) %>%
  mutate(recnums = replace_na(recnums, 0)) %>%
  mutate(numslessrec = atoutput - recnums) %>%
  mutate(surv = numslessrec/lag(atoutput, default = first(numslessrec))) %>%
  mutate(Z = -1 * log(surv))

plotZ <- ggplot() +
  geom_line(data=Zish, aes(x=Time/365, y=relZ)) +
  geom_point(data=annZ1, aes(x=Time/365, y=Z)) +
  theme_tufte() +
  labs(subtitle = paste(scenario.name, species_ss))

plotZ + ylim(0, 2.0)

```

This looks better; we are starting to see a pattern.

Second summing within year Z from calc_Z:

```{r annZ2}

annZ2 <- fullresZ %>%
  mutate(yr = floor(time/stepperyr)+1) %>%
  group_by(species, yr) %>%
  summarise(Z = sum(atoutput))

plotZ <- ggplot() +
  geom_line(data=Zish, aes(x=Time/365, y=relZ)) +
  geom_point(data=annZ2, aes(x=yr, y=Z)) +
  theme_tufte() +
  labs(subtitle = paste(scenario.name, species_ss))

plotZ + ylim(0, 2.0)


```

Also looks better. Not sure which is more correct.

Compare methods:

```{r compZall}

plotZ <- ggplot() +
  geom_line(data=Zish, aes(x=Time/365, y=relZ, color="mort.txt Z")) +
  geom_point(data=annZ1, aes(x=Time/365, y=Z, color="endyr Z")) +
  geom_point(data=annZ2, aes(x=yr, y=Z, color="sum timestep Z")) +
  theme_tufte() +
  labs(subtitle = paste(scenario.name, species_ss))

plotZ + ylim(0, 2.0)


```

Both seem to be tracking each other reasonably, and also general trends from mort.txt output. Discussion with Beth suggests that mort.txt output is an initial calculation combining deaths due to predation with deaths due to fishing, but that there is rescaling of predation (and possibly fishing?) afterward this output stage such that mort.txt does not provide final mortality. If the rescaling varies by timestep, then matching general patterns rather than the full internannual variability is probably the best we can do, since we don't know the full interannual variability.

We can probably conclude that the `calc_Z` function is working as correctly as possible now, given that it is an approximation no matter what, and the differences we see here are likley from temporal resolution of the Z estimate.

So what is M? This is an approximate annual M as Z-F from mort.txt (which assumes F is not rescaled!). If we need annual and age specific M then we need a new Z calc function that is cohort specific I think.

```{r isthisM, message=FALSE, warning=FALSE}

annM1 <- merge(annZ1, relF_ss, all.x = T) %>%
  mutate(M = Z-relF)

annZ2_Time <- annZ2 %>%
  mutate(Time = yr*365)

annM2 <- merge(annZ2_Time, relF_ss) %>%
  mutate(M = Z-relF)

plotM <- ggplot() +
  geom_line(data=Zish, aes(x=Time/365, y=relZ, color="mort.txt Z")) +
  geom_line(data=Zish, aes(x=Time/365, y=relM, color="mort.txt M")) +
  geom_point(data=annM1, aes(x=Time/365, y=M, color="M = endyr Z-F")) +
  geom_point(data=annM2, aes(x=yr, y=M, color="M = sum timestep Z-F")) +
  theme_tufte() +
  theme(legend.position = "bottom", legend.box = "horizontal") +
  scale_color_discrete(NULL) +
  guides(color = guide_legend(nrow = 1)) +
  labs(subtitle = paste(scenario.name, species_ss))

plotM + ylim(0, 2.0)


```

So the problem here is we get negative M if we use our calculated Z from annual numbers at age or timestep numbers at age and subtract the F from mort.txt. (Negative M does not show up as points above because I restrict the y axis scale to 0-2.0.) The F from mort.txt is actually higher than our estimated Z. So Atlantis must rescale F too???

Need to investigate backing out F...

Let's see how it performs with another model.

### Test calc_Z NOBA

Might be a good time to switch back to NOBA for comparison with true annual ages.

```{r switch-NOBA}

initCCA <- FALSE
initNEUS <- FALSE
initNOBA <- TRUE

if(initNOBA) source(here("config/NOBAaaConfig.R"))

species_ss <- "North_atl_cod"

#Load functional groups
funct.groups <- load_fgs(dir=d.name,
                         file_fgs = functional.groups.file)
#Get just the names of active functional groups
funct.group.names <- funct.groups %>%
  filter(IsTurnedOn == 1) %>%
  select(Name) %>%
  .$Name

#Get true NOBAaa
if(!file.exists(file.path(d.name,
                          paste0("output", scenario.name, "run_truth.RData")))){
  #Store all loaded results into an R object
  truth <- run_truth(scenario = scenario.name,
                     dir = d.name,
                     file_fgs = functional.groups.file,
                     file_bgm = box.file,
                     select_groups = funct.group.names,
                     file_init = initial.conditions.file,
                     file_biolprm = biol.prm.file,
                     file_runprm = run.prm.file
  )
} else{
  truth <- get(load(file.path(d.name,
                              paste0("output", scenario.name, "run_truth.RData"))))
}

source(here("config/census_spec.R"))

```

Lets try just the Z part for NOBA:

```{r Znoba}

# make a function for this
# add YOY file to the config files
YOY <- load_yoy(d.name, paste0("output", scenario.name, "YOY.txt"))

# load biolprm in some initialize file?
biol <- load_biolprm(d.name, biol.prm.file)

# get code matching species name to split YOY file
code_ss <- funct.groups$Code[which(funct.groups$Name == species_ss)]

# cut to a single species in YOY file
YOY_ss <- YOY %>%
  select(Time, paste0(code_ss, ".0"))

# numbers at agecl at full resolution (all polygons and layers)
truenums_ss <- truth$nums[truth$nums$species == species_ss,]

#calc_Z <- function(yoy, nums, fgs, biolprm, toutinc)

#oops, need to generalize calc_Z for subannual timesteps in YOY!
#or input YOY only at 0, 365, etc since the numbers repeat for timesteps
YOY_ss <- YOY_ss %>%
  filter(Time %in% seq(0, max(Time), by=365))

fullresZ <- calc_Z(yoy = YOY_ss,
                   nums = truenums_ss,
                   fgs = funct.groups,
                   biolprm = biol,
                   toutinc = runpar$toutinc)

# compare as above with mort.txt output
file.mort <- file.path(d.name, paste0("output", scenario.name, "Mort.txt"))

mortish <- read.table(file.mort, header = TRUE)

relF_ss <- mortish %>%
  select(Time, relF = paste0(code_ss, ".F"))

relM_ss <- mortish %>%
  select(Time, relM = paste0(code_ss, ".M"))

Zish <- merge(relF_ss,relM_ss) %>%
  mutate(relZ = relF + relM)

annZ2 <- fullresZ %>%
  mutate(yr = floor(time/stepperyr)+1) %>%
  group_by(species, yr) %>%
  summarise(Z = sum(atoutput))

plotZ <- ggplot() +
  geom_line(data=Zish, aes(x=Time/365, y=relZ, color="mort.txt Z")) +
  geom_point(data=fullresZ, aes(x=time/stepperyr, y=atoutput, color="each timestep Z")) +
  geom_point(data=annZ2, aes(x=yr-1, y=Z, color="sum timestep Z")) +
  theme_tufte() +
  labs(subtitle = paste(scenario.name, species_ss))

plotZ + ylim(-1, 10.0)

```

This Z is a bad match. The Z values in NOBA's mort.txt are huge. Not sure what to make of this.

The above figure is now the best it is going to get after a deeper dive back into `calc_Z` for NOBA before we proceed to true ages. Found one bug hidden by sardine's equal recruit_period and recend, another where the recruitment period is all within one output timestep that resulted in proportions over 1 (inflating the recruitment and then subtracting it off). Further, it appears that timesteps for recruitment were still not correctly aligned, so trying a different alignment that leaves out the timestep 0 from the truth. Not sure the latter is correct, or if we need to rethink the calculation of survival based on when recruits actually enter the model.

Update: the new code actually makes the CCA sardine Z estimates more in line with each other, so the updates are correct and we are keeping them in `calc_Z`.

Most important: Beth says that the M in mort.txt can really be as far off as we are seeing it here. The green line in the above NOBA Z plot reflects the initial estimate of what dies by predation mortality, which is a gigantic overesimate because many availability parameters are set to 1. After this output step, the predation mortality is rescaled within the model to account for the fact that there really aren't that many to be eaten.

Our estimate of Z is from the actual numbers alive at the end of each timestep and the known input recruitment prior to any mortality, so I think it really is the best estimate we have. More debugging code commented out below but see the rmd file for the gory and redundant details.
<!--
```{r calcZ-snippet2, eval=FALSE}

survey_testNall <- readRDS(file.path(d.name, paste0(scenario.name, "survey_testNall.rds")))

# numbers at ageclaggregated over layers, retain polygons (output of create_survey)
survey_testN_ss <- survey_testNall[survey_testNall$species == species_ss,]

yoy <- YOY_ss
nums <- survey_testN_ss
fgs <- funct.groups
biolprm <- biol
toutinc <- runpar$toutinc

#calc_Z <- function(yoy, nums, fgs, biolprm, toutinc) {
  # subset the yoy for species included in the fgs file
  # that are turned on
  # colnames of the recruit data are "Time" and a column for each
  # species where the name is the functional group with an additional .0
  turnedon <- fgs[fgs$IsTurnedOn > 0, ]
  recruits <- yoy[, colnames(yoy) %in% c("Time", paste0(turnedon$Code, ".0"))]

  # mg carbon converted to wet weight in tonnes
  k_wetdry <- biolprm$kgw2d / 1000000000
  # Sum of structural and reserve nitrogen (KWSR_RN + KWSR_SN)
  nitro <- merge(biolprm$kswr, biolprm$kwrr, by = "1")
  nitro$sum <- apply(nitro[, 2:3], 1, sum)

  # legacy: If output is from legacy code there will be an error in the
  # yoy data, where the first row is in a different unit.
  # yoy.txt is the biomass in tonnes per spawning event summed over the total
  # model domain.
  # The first row (< Nov/Dec 2015) is stored as biomass and the remaining rows
  # are stored in numbers, must convert the entire matrix to biomass
  # Check if legacy code and if so convert the numbers to biomass

  # G.Fay 2/21/16 : changed yoy to recruits in below loop.
  if (abs(recruits[1, 2] / recruits[2, 2]) > 10) {
    recruits[2:NROW(recruits), 2:NCOL(recruits)] <- recruits[2:NROW(recruits), 2:NCOL(recruits)] *
    nitro[match(gsub(".0", "", colnames(recruits)[-1]), nitro[, 1]), "sum"]
  }


  # Wide to long
  recruits <- reshape(data = recruits, direction = "long",
    varying = colnames(recruits)[-1],
    v.names = "recruits",
    times = colnames(recruits)[-1],
    timevar = "group")
  rownames(recruits) <- 1:NROW(recruits)
  recruits <- recruits[, -which(colnames(recruits) == "id")]
  # Switch from species code to species
  recruits$group <- gsub("\\.0", "", recruits$group)
  recruits <- merge(recruits, fgs[, c("Code", "Name")],
    by.x = "group", by.y = "Code")

  # merge recruits with strucn and resn of recruits from biol.prm file
  recruits <- merge(recruits, nitro[, c(1, 4)],
    by.x = "group", by.y = "1", all.x = TRUE, all.y = FALSE)
  colnames(recruits)[which(colnames(recruits) == "recruits")] <- "recruitsbio"
  # Get recruits in numbers rather than biomass
  recruits$recruits <- recruits$recruitsbio / recruits$sum
  recruits$yr <- as.integer(round(recruits$Time)/365) #needed to merge with totnums

  # June 2019:Isaac determined that yr 0 (Time 0) in the YOY file is not real, don't use it
  # code below assigns year 1 to times 0:stepperyr, all good output timesteps in truth$nums


  # G.Fay 2/21/16
  # UGLY code below tries to align fraction of annual yoy with timing of recruitment
  # values in YOY.txt are total YOY that year waiting to recruit.
  # seems to get rid of most of 'issues' - some v.minor survival >1,
  #perhaps due to averaging of survival over toutinc days
  nyrs <- ceiling(max(yoy$Time)/365)
  times <- unique(yoy$Time)

  # SKG June 2019
  # need to match codes below, this mismatches when biolprm not in same order!!
  #recstart_temp <- biolprm$recruit_time
  #recstart_temp[,2] <- biolprm$time_spawn[-(grep('#',
  #                     biolprm$time_spawn[,1])),2] + biolprm$recruit_time[,2]
  # was this grep to get rid of a species with #XXX? breaks when there are none
  #recstart_temp[,2] <- biolprm$time_spawn[,2] + biolprm$recruit_time[,2]

  #recstart_temp <- recstart_temp[recstart_temp[,1]%in%turnedon$Code,] #bug added codes with no YOY output
  #recstart_temp <- recstart_temp[recstart_temp[,1]%in%recruits$group,]

  # June 12 2019 Beth determined that we do not need spawn_period

  # rectiming is a dataframe with species code, time_spawn (day of year),
  # recruit_time (number of days), and recruit period (number of days).
  # we use these to calculate recstart (day of year) and recend (day of year)
  rectiming <- merge(biolprm$time_spawn, biolprm$recruit_time, by = 1)
  rectiming <- merge(rectiming, biolprm$recruit_period, by = 1)
  names(rectiming) <- c("Code", "time_spawn", "recruit_time", "recruit_period")
  rectiming <- rectiming %>%
    mutate(recstart = time_spawn + recruit_time) %>%
    mutate(recend = recstart + recruit_period)

  #subsets to groups of interest
  recstart_temp <- rectiming[rectiming$Code %in% recruits$group,]

  # Sum numbers output over all boxes/depth/cohorts
  # align model output timesteps (days) with recruitment periods (days)
  totnums <- aggregate(atoutput ~ species + time, data = nums, sum) %>%
    #mutate(time.days = (time+1)*toutinc) %>% #makes time 0 into days 0->73, etc
    mutate(time.days = (time)*toutinc) %>% #makes time 1 into days 0->73, etc
    mutate(yr = ceiling(time.days/365))  # yr 1 is 0:stepsperyr to match recruits yr1

  totnums <- merge(totnums, recruits,
                   by.x = c("yr", "species"), by.y = c("yr", "Name"),
                   all.x = TRUE) %>%
    arrange(time)

  totnums$frac_recruit <- 0

  for (irow in 1:nrow(recstart_temp)) {
    group <- recstart_temp$Code[irow]
    pick <- which(totnums$group == group)

    recstart <- seq(recstart_temp$recstart[irow],by=365,length.out=nyrs)
    recstart <- recstart[recstart<max(totnums$Time[pick])]
    recend <- recstart + recstart_temp$recruit_period[irow]
    #rec_times <- rbind(rec_times,cbind(group,recstart,recend))

    for (i_rec in 1:length(recstart)) {
      i_tstart <- which(pick==min(pick[totnums$time.days[pick]>=recstart[i_rec]]))
      i_tstop <- which(pick==min(pick[totnums$time.days[pick]>=recend[i_rec]]))
      if (i_rec == length(recstart)) i_tstop <- length(pick)
      n_t <- 1+i_tstop-i_tstart
      for (i_t in 1:n_t) {
        t_temp <- totnums$time.days[pick[i_tstart+i_t-1]]
        num_temp <- t_temp - recstart[i_rec]
        if (i_t>1) {
          if ((recend[i_rec]-t_temp)>toutinc) {
            num_temp <- toutinc
          }
          else {
            num_temp <- recend[i_rec]-(totnums$time.days[pick[i_tstart+i_t-2]])
          }
        }
        frac_temp <- max(c(0,num_temp /(recend[i_rec]-recstart[i_rec])))
        if(frac_temp > 1.0) frac_temp = 1.0
        totnums$frac_recruit[pick[i_tstart+i_t-1]] <- frac_temp
      }
    }
  }

  #G.Fay 1/6/16, expand to num of recruits
  # Recruit / mg C converted to wet weight in tonnes / redfield ratio of C:N
  totnums$recruits <- totnums$recruits/k_wetdry/biolprm$redfieldcn

  totnums$group <- recruits$group[match(totnums$species, recruits$Name)]
  # For all time increments where there
  totnums$recruits[is.na(totnums$recruits)] <- 0

  totnums$annrecruits <- totnums$recruits

  totnums$recruits <- totnums$annrecruits*totnums$frac_recruit

  # Calculate survivors for each species group
  totnums$survivors <- totnums$recruits

  totnums$survival <- totnums$survivors
  # Calculate survival for each group
  for (group in unique(totnums$group)) {
    #if(group == "SHD") browser()
    pick <- which(totnums$group == group)
    survival_temp <- c(NA,
      totnums$survivors[pick[-1]]/totnums$atoutput[pick[-length(pick)]])

    # G.Fay 2/21/16  "think" this is what things should be, recruits don't show up
    # in numbers at age until time step after the recruitment event.
     survival_temp <- c(
       (totnums$atoutput[pick[-1]]-
          totnums$recruits[pick[-1]])/totnums$atoutput[pick[-length(pick)]],NA)

#     survival_temp <- c(
#       (totnums$atoutput[pick[-1]])/
#          (totnums$recruits[pick[-1]]+totnums$atoutput[pick[-length(pick)]]),NA)

    survival_temp[survival_temp < 0] <- NA
    # Use first positive value to replace the initial year and all negative vals
    firstgood <- which(!is.na(survival_temp))[1]

    survival_temp[1:firstgood] <- survival_temp[firstgood]
    for(ii in seq_along(survival_temp)) {
      if (is.na(survival_temp[ii])) {
        nonzero <- which(which(survival_temp > 0) > ii)
        if (length(nonzero) == 0) nonzero <- which(survival_temp > 0)
        survival_temp[ii] <- survival_temp[which.min(abs(nonzero - ii))]
      }
    }
    totnums$survival[pick] <- survival_temp
   }

  #Calculate Z
  totnums$Z <- -1 * log(totnums$survival)
  finaldata <- data.frame("species" = totnums$species,
    "agecl" = NA, "polygon" = NA, "layer" = NA,
    "time" = totnums$time, "atoutput" = totnums$Z)

  #return(finaldata)
#}


```
-->

## Biological sampling: true numbers at age class

Number of true age classes is stored in the [...]groups[...].csv file, which is read in using `load_fgs`. We read this in as funct.groups above. The column NumAgeClassSize stores the number of true ages:

Number of age classes for Cod: `r funct.groups$NumAgeClassSize[funct.groups$Name=="North_atl_cod"] `

```{r Natage-plot}

Natage <- readRDS(file.path(d.name, paste0(scenario.name, "Natageclcensus.rds")))

Natage_ss <- Natage[Natage$species == species_ss,]

Natageplot <- ggplot(Natage_ss, aes(x=agecl, y=atoutput)) +
  geom_point() +
  theme_tufte() +
  labs(subtitle = paste(scenario.name,
                        Natage_ss$species))

Natageplot + facet_wrap_paginate(~time, ncol=4, nrow = 4, page = 1, scales="free")
Natageplot + facet_wrap_paginate(~time, ncol=4, nrow = 4, page = 2, scales="free")
Natageplot + facet_wrap_paginate(~time, ncol=4, nrow = 4, page = 3, scales="free")
Natageplot + facet_wrap_paginate(~time, ncol=4, nrow = 4, page = 4, scales="free")

```

This really is still overly optimistic, but now that I think `calc_Z` is working, if we feed these into `calc_stage2age` and get a result that matches the NOBA output ANNAGEBIO.nc, then we can assume that function is correct.

These are estimated true age comps for Cod in annual ages:

```{r plotannage}

survey_testNall <- readRDS(file.path(d.name, paste0(scenario.name, "survey_testNall.rds")))

# numbers at ageclaggregated over layers, retain polygons (output of create_survey)
survey_testN_ss <- survey_testNall[survey_testNall$species == species_ss,]


# for single species, Z was calculated above as surveyresZ