This document sets out the high-level tasks which the `vg` development team hopes to accomplish in the next few versions of `vg` and beyond.

# By Time

These are the things we hope to achieve on several planning horizons:

### Next 3 Months

- [ ] Easy Giraffe [#3126](https://github.com/vgteam/vg/issues/3126)
    - [ ] Unified Indexing [#3144](https://github.com/vgteam/vg/issues/3144)
        - [ ] `vg index --giraffe` [#3144](https://github.com/vgteam/vg/issues/3144) (Adam, Jordan)
        - [ ] `vg index --mpmap` (Adam, Jordan)
        - [ ] `vg index --map` (Adam, Jordan)
        - [ ] Less cruft in `vg index` [#3144](https://github.com/vgteam/vg/issues/3144) (Adam, Jordan)
    - [ ] Document this in a way that is under test [#3145](https://github.com/vgteam/vg/issues/3145)
    - [ ] Low-memory construction of indexes (100 genomes, 100m variants) for Giraffe from extended GFA text (with translation saving) (Jouni)
        - [ ] Get working in 200 GB memory (scales with graph node count) for easy/current GBWT build implementation
            - [ ] Split by contig and merge?
            - [ ] Smart job scheduling to keep in a RAM budget
- [ ] GFA compatibility improvement push
    - [ ] Settle how to handle long nodes
    - [ ] Refer people to [GetBlunted](https://github.com/vgteam/GetBlunted) for non-blunt GFA import.
    - [ ] Accept non-numeric ids in GFA import (bonus points for preserving minigraph `s<ID>` as `<ID>` internally). (Jordan)
    - [ ] Check if text GFA is feasible at HPRC scale (Glenn)
    - [ ] Paths and haplotypes from GFA (Adam, Glenn)
        - [ ] Convert rGFA tags (SN SO SR...) into paths, for at least rank-0 (the primary reference)
        - [ ] Accept GFA-style paths that contain sample and haplotype ids instead of names for haplotype import.
        - [ ] Accept extended GFA haplotypes
            - [ ] Basic subpath support in vg
- [ ] Translations (Jouni to start)
    - [ ] Define and emit translation from chopped graph back to input GFA coordinates, for manual import
    - [ ] Saving, loading, and using coordinate translations to/from node-coalesced, string-ID'd input GFA space
    - [ ] Implicit node chopping on GFA input
    - Keep an eye on rGFA
    
Research topics: accurate long read Giraffe scaling at Q20: How??? 

### Next 6 Months

- [ ] Full subpath support in vg (Adam, Jordan)
    - [ ] HG API support (see old Github issue on handlegraph)
    - [ ] Plugging in to tools
- [ ] Implement Distance Index 2, which also works as a snarl manager (Xian)
- [ ] Drop pinchesAndCacti and sonlib
    - [ ] Drop Cactus-library-based snarl finder (Adam)
- [ ] Adapt all snarl usage to go through new handle-based API (Adam, Jordan)
    - [ ] Shim `Snarl*` as a non-Protobuf adapter type?
- [ ] Transparently load GFA into `HashGraph` for any tool that reads a handle graph. (Glenn, Adam)
    - Probably better than a mapped GFA file backed graph
- [ ] Eliminate `vg::VG` (Jordan)
    - [ ] Steal all the things only it can do away from it
- [ ] Default everything to GAF instead of GAM
    - [ ] mpGAF (Jordan, Jonas)
- [ ] Long read Giraffe (Xian)

### Next Year

- [ ] Instant load/memory mapping
    - [ ] For tube map, to enable interactive whole-genome use (Future data vis enthusiast)
    - [ ] For Giraffe
    - [ ] For graph access from Python via `libbdsg`
- [ ] Algorithms in `libbdsg`, available from Python
- [ ] Get GBWT build working in under 200 GB memory on 100m variants with fancy disk-backed in-progress GBWT implementation (need 300m random access vectors that grow independently)
- [ ] Support Erik's multi-level graph format when mature
- [ ] Redesign and reorganize little tools (Where should each manipulation live? Should some just be scripts you write?)
    - [ ] `vg mod`
    - [ ] `vg chunk`
    - [ ] `vg circularize`
    - [ ] `vg view`
    - [ ] `vg paths`

## Running Projects

These are things we are working on, with no particular delivery date goal.

- [ ] Use of MCMC techniques in the genotyper with multipath alignments 

## Wishlist

These are things we would like to do eventually.

- Alignment
    - [ ] Adoption of the multipath alignment paradigm as the default
    - [ ] Graph-to-graph mapping (Xian)
- Variant Calling
    - [ ] Implementation of an HHGA-like machine learning based variant caller
    - [ ] Integration of variant calling and assembly polishing processes
    - [ ] Prune the zoo of TraversalFinders, and expose the useful ones to Python
- Visualization
    - [ ] Browser-free tube map
    - [ ] Better tube map handling of edge cases
        - [ ] No haplotypes on a node
        - [ ] Starting on a rare haplotype
- Infrastructure
    - [ ] Destructively modernize and unify IO
        - [ ] Eliminate VPKG framing if possible in favor of magic numbers everywhere
            - [ ] Resolve ensuing questions about GAM format
                - [ ] Just use GAF?
            - [ ] Handle things like GFA that need to manually sniff
        - [ ] Just save from the object; no more `save_handle_graph`
        - [ ] Magic format registration for `libvgio` magic numbers for loading
        - [ ] Depend on `libvgio` in `libbdsg` to do the IO there and pick the right handle graph implementation
    - [ ] Replace Protobuf internal formats with faster ones
    - [ ] Revision of ID assignment logic to allow deterministic node breaking
    - [ ] Accept gzipped GFA if practical (can't mmap)
    - [ ] Improved HandleGraph API
        - [ ] Abstract away node boundaries
        - [ ] View all sequence as C++17 string_views instead of sequence-owning strings
        - [ ] O(1) reverse complement DNAStringView
    - [ ] CMake-ify the main vg build
    - [ ] Eliminate old systems and their associated submodules, or factor them out into their own projects
        - [ ] `vg vectorize` could be its own project
            - [ ] Update `vg vectorize` to modern, system Vowpal Wabbit
            - [ ] Or pull it out into its own submodule and remove Vowpal Wabbit dependency from vg
        - [ ] Eliminate RocksDB from vg; everybody using `vg map` uses GCSA indexes now.
        - [ ] `vg genotype`
        - [ ] `vg srpe`
    - [ ] More cross-language support
        - [ ] Interoperate with Rust handle graph users/providers
        - [ ] Interoperate with Java handle graph users/providers
        

    








